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(57) Abstract 

Disclosed are novel isolated nucleic acids and substancially pure protein preparations for naturally occurring and synthetic or chimeric 
heparan sulfate D-glucosaminyl 3-O-sulfo-transferases (3-OSTs). Also disclosed are uses for these genes and proteins, including uses for 
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HEPARAN SULFATE D-GLUCOSAMINYL 3-0-SULFOTRANSFERASES, AND USES 

THEREFOR 

Field of the Invention 

The present invention is related to the field of biochemistry and molecular biology, and in 
particular to the field of enzymology and heparan sulfate biosynthesis. 

Background of the Invention 

5 The serine proteases of the intrinsic blood coagulation cascade are slowly neutralized by 

antithrombin (AT) (reviewed in (1)). This inhibition is secondary to the generation of 1 : 1 
enzyme-AT complexes whose formation is dramatically enhanced by the mast cell product, 
heparin (2). Damus et ai. (3) hypothesized that endothelial cell surface heparan sulfate 
proteoglycans (HSPGs) function in a similar fashion to accelerate coagulation enzyme inactivation 

10 by AT, and therefore are responsible for the non-thrombogenic properties of blood vessels. It 
was initially demonstrated that perfusion of the hindlimbs of normal rodents and rodents deficient 
in mast cells with purified thrombin (T) and AT leads to a greatly elevated rate of T-AT complex 
formation and that the enzyme heparitinase as well as the natural heparin antagonist platelet factor 
4 suppress the above acceleration (4, 5). It was subsequently showed that cultured cloned bovine 

15 macrovascular and rodent microvascular endothelial cells synthesize both anticoagulant HSPG 
(HSPG act ) as well as nonanticoagulant HSPG (HSPG™ 01 ) (6-8). HSPG act bear glycosaminoglycan 
(GAG) chains that bind tightly to AT and accelerate T-AT complex generation (6-8). 

The biosynthesis of HSPG" 11 requires generation of a core protein, assembly of a linkage 
region of four neutral sugars on specific serine attachment sites of the core protein, elongation of 

20 a GAG backbone composed of alternating N-acetylghicosamine and glucuronic acid residues, and 
modification of this homogenous copolymer by partial 7V-deacetylation with coupled N-sulfation 
of glucosamine residues, partial epimerization of glucuronic acid to iduronic acid residues, partial 
2-O-sulfation of uronic acid residues, and partial 6-0-sulfation and partial 3-0-sulfation of 
glucosamine residues (reviewed in (9)). This multienzyme pathway generates HSPG act with 

25 regions of defined structure that contain the primary AT binding domain sequence found in 
anticoagulant heparin; uronic acid-»glucosamine (AT-acetyl/W-sulfete) 6-O-sulfate— »glucuronic 
acid-»glucosamine N-sulfate 3-0-sulfate (6-0-sulfate)-Mduronic acid 2-0-sulfate->glucosamine 
JV-sulfate 6-0-sulfate (10-17). These reactions also produce HSPG 1 " 01 with regions of varying 
monosaccharide sequence that lack the primary AT -binding domain. The structure-function 

30 relationships of the AT binding domain have been elucidated with heparin/heparan sulfate 

oligosaccharides in association with fast reaction kinetics and equilibrium binding assays. The 6- 
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0-sulfate group on residue 2 and the 3-0-sulfate group on residue 4 function in a 
thermodynamically linked fashion to supply half of the binding energy for interaction with AT, and 
trigger a conformational event that accelerates neutralization of specific coagulation proteases 
(11, 12). The amino and ester sulfate groups at residues 5 and 6, as well as carboxyl groups at 
5 other sites, provide the other half of the binding energy for interaction with protease inhibitor (10, 
1 1). Furthermore, monosaccharide sequences outside the primary AT binding domain are 
essential in facilitating inhibition of coagulation proteases other than factor Xa (1 8, 19). 

During the past eight years, several biosynthetic enzymes that generate HSPG"* and 
HSPG^have been purified. These proteins include an #-acetylglucosamine/glucuronic acid 

10 copolymerase (20), A^-deacetylase/A^sulfotransferases (NST-1 and NST-2) (21, 22), a glucuronic 
acid/iduronic acid epimerase (23), an iduronic acid/glucuronic acid 2-0-sulfotransferase (2-OST) 
(24), a glucosamine 6-O-sulfotransferase (6-OST) (25) and a glucosamine 3-0-sulfotransferase 
(3-OST) (26, 35). However, the only enzymes that have also been molecularly cloned are two 
structurally and functionally distinct isoforms of ^V-deacetylaseW-sulfotransferase (NST-1 from 

15 liver and NST-2 from mastocytoma) (27-3 1), and the 2-OST and epimerase. The above enzymes 
must function in a coordinated manner to produce the AT binding domain because the abundance 
of this sequence is much greater than predicted from a random assembly of constituents (32). The 
postulated regulatory mechanism must direct the biosynthetic enzymes to cany out the 
appropriate sequence of epimerization/sulfation reactions to generate the AT binding domain (33, 

20 34). 

Summary of foe Invention 
The present invention depends, in part, upon the identification and molecular cloning of 
novel genes encoding mammalian heparan sulfate D-glucosaminyl 3-O-suIfotransferases (3- 
OSTs). In particular, as disclosed herein, the present invention provides nucleic acid (SEQ ID 

25 NO: 1) and amino acid (SEQ ID NO: 2) sequences for murine 3-OST-l; nucleic acid (SEQ ID 
NO: 3) and amino acid (SEQ ID NO: 4) sequences for human 3-OST-l; nucleic acid (SEQ ID 
NO: 5) and amino acid (SEQ ID NO: 6) sequences for human 3-OST-2; nucleic acid (SEQ ID 
NO: 7) and amino acid (SEQ ID NO: 8) sequences for human 3-OST-3A; nucleic acid (SEQ ID 
NO: 9) and amino acid (SEQ ID NO: 10) sequences for human 3-OST-3B; and nucleic acid (SEQ 

30 ID NO: 1 1) and amino acid (SEQ ID NO: 12) sequences for human 3-OST-4. In addition, the 
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invention provides amino acid (SEQ ID NO: 15) sequences for a C. elegans homologue, " 
ce3-OST. 

Thus, in one aspect, the present invention provides isolated nucleic acids encoding at least 
a functional fragment of a 3-OST protein. In preferred embodiments, the nucleic acid encodes a 
5 3-OST protein comprising a mature murine or human 3-OST- 1 . In other embodiments, the 
nucleic acid encodes a 3-OST protein selected from 3-OST-l, 3-OST-2, 3-OST-3A, 3-OST-3B, 
3-OST-4, and ce3-OST. In other preferred embodiments, the nucleic acid encodes a 3-0- 
sulfotransferase domain of a 3-OST protein selected from 3-OST-l, 3-OST-2, 3-OST-3A, 3- 
OST-3B, 3-OST-4, and ce3-OST. In particular embodiments, the nucleic acid comprises a 

10 nucleotide sequence selected from nucleotide sequences within: (a) SEQ ID NO: 1; (b) SEQ 
ID NO: 3; (c) SEQ ID NO: 5; (d) SEQ ID NO: 7; (e) SEQ ID NO: 9; (f) SEQ ID NO: 11; 
(g) a sequence having at least 60% nucleotide sequence identity with at least one of (a)-(f) and 
encoding a functional fragment having sequence-specific HS binding affinity or 3-0- 
sulfotransferase activity; and (h) a sequence differing from a sequence of(a)-(g) only by the 

15 substitution of synonymous codons. In other particular embodiments, the present invention 

provides an isolated nucleic acid encoding a polypeptide selected from: (a) residues 21-52, 260- 
269, 250-276, 53-311, or 21-307 of SEQ ID NO: 2; (b) residues 21-48, 256-265, 246-272, 49- 
307, or 21-303 of SEQ ID NO: 4; (c) residues 42-109, 313-325, 303-332, or 110-367 of SEQ 
ID NO: 6; (d) residues 44-147, 351-363, 341-370, or 148-406 of SEQ ID NO: 8; (e) residues 

20 66-132, 336-348, 326-355, or 133-390 of SEQ ID NO: 10; (f) residues 396-408, 386-415, or 
207-456 of SEQ ID NO: 12; (g) residues 240-250, 230-257, 23-291 of SEQ ID NO: 15, (h) a 
sequence having at least 60% amino acid sequence similarity with at least one of (a)-(g) and 
encoding a functional fragment having sequence-specific HS binding affinity or 3-0- 
sulfotransferase activity; and (i) a sequence comprising a chimera of at least two of sequences 

25 (a)-(h). 

In another aspect, the present invention provides isolated nucleic acids comprising at least 
16 consecutive nucleotides of a nucleotide sequence selected from SEQ ID NO: 1, SEQ ID NO: 
3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, and SEQ ID NO: 1 1. 

In another aspect, the present invention provides for cells and cell lines transformed with 
30 the nucleic acids of the present invention. Thus, the invention provides host cells transfonned 
with any of the above-described nucleic acids. The transformed host cells may be bacterial, yeast, 
or insect cells. Preferably, however, the host cells are mammalian cells, including endothelial 
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cells, mast cells, fibroblasts, hybridomas, oocytes, and embryonic stem cells. Examples of " 
preferred mammalian cells include COS-7 cells, murine primary cardiac microvascular endothelial 
cells (CME), murine mast cell line C57. 1, primary human endothelial cells of umbilical vein 
(HUVEC), F9 embryonal carcinoma cells, rat fat pad endothelial cells (RFPEC), L cells (e.g., 
5 murine LTA/tff cells), and cells derived from the transgenic animals of the invention. The 
transformed host cells may also be fetal cells, embryonic stem cells, zygotes, gametes, or germ 
line cells. Transformed embryonic stem cells, zygotes, gametes, and germ line cells, as well as 
other mammalian cells, may be used to produce transgenic animals in which the expression of 3- 
OST genes have been altered (e.g., knock-outs, enhanced expression, ectopic expression). 

10 In another aspect, the present invention provides substantially pure protein preparations 

comprising at least a functional fragment of a 3-OST protein. Thus, in one embodiment, the 
present invention provides a substantially pure protein preparation comprising mature murine 3- 
OST-1 or mature human 3-OST-l . In another embodiment, the 3-OST protein is selected from 
the group consisting of 3-OST-l, 3-OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, and ce3-OST. In 

15 another embodiment, the fragment comprises a 3-O-sulfotransferase domain of a 3-OST protein 
selected from the group consisting of 3-OST-l, 3-OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, and 
ce3-OST. In particular embodiments, the present invention provides a substantially pure protein 
preparation in which the 3-OST protein comprises an amino acid sequence selected from: (a) 
SEQ ID NO: 2; (b) SEQ ID NO: 4; (c) SEQ ID NO: 6; (d) SEQ ID NO: 8; (e) SEQ ID NO: 

20 10; (f) SEQ ID NO: 12; (g) SEQ ID NO 15; and (h) a sequence having at least 60% amino 
acid similarity with at least one of (a)-(g) and having sequence-specific HS binding affinity or 3- 
O-sulfotransferase activity. In other particular embodiments, the present invention provides a 
substantially pure protein preparation in which the 3-OST protein comprises an amino acid 
sequence selected from: (a) residues 21-52, 260-269, 250-276, 53-311, or 21-307 of SEQ ID 

25 NO: 2; (b) residues 21-48, 256-265, 246-272, 49-307, or 21-303 of SEQ ID NO: 4; (c) 

residues 42-109, 313-325, 303-332, or 1 10-367 of SEQ ID NO: 6; (d) residues 44-147, 351- 
363, 341-370, or 148-406 of SEQ ID NO: 8; (e) residues 66-132, 336-348, 326-355, or 133- 
390ofSEQIDNO: 10; (f) residues 396-408, 386-415, or 207-456 of SEQ ID NO: 12; (g) 
residues 240 r 250, 230-257, 23-291 of SEQ ID NO: 15; (h) a sequence having at least 60% 

30 amino acid sequence similarity with at least one of (a)-(g) and encoding a functional fragment 
having sequence-specific HS binding affinity or 3-O-sulfotransferase activity; and (i) a sequence 
comprising a chimera of at least two of sequences (a)-(h). 
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In another aspect, the present invention provides for antibodies and methods for making 
antibodies which selectively bind with the 3-OST proteins. These antibodies include monoclonal 
and polyclonal antibodies, as well as functional antibody fragments such as F(ab) and Fc. 

In another aspect, the present invention provides for methods for producing the above- 
5 described proteins. Thus, in one set of embodiments, the isolated nucleic acids of the invention 
may be used to transform host cells or create transgenic animals which express the proteins of the 
invention. The proteins may then be substantially purified from the cells or animals by standard 
methods. Alternatively, the isolated nucleic acids of the invention may be used in cell-free in vitro 
translation systems to produce the proteins of the invention. 

10 In another aspect, the present invention provides methods for 3-O-sulfating saccharide 

residues within a preparation of glycosaminoglycan or proteoglycan polysaccharides by contacting 
the preparation with at least a 3-O-sulfotransferase domain of a 3-OST protein in the presence of 
a sulfate donor under conditions which permit sulfation of the residues, and wherein the 3-OST 
protein is selected from 3-OST- 1, 3-OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, and ce3-OST 

15 proteins, as well as conservative substitution variants and/or chimeras thereof. In particular 

embodiments, the present invention provides methods for 3-O-sulfating saccharide residues within 
a preparation of glycosaminoglycan or proteoglycan polysaccharides in which the polysaccharides 
include a polysaccharide sequence of GlcA-»GlcNS ±6S. These methods comprise contacting 
the GlcA->GlcNS ±6S-containing polysaccharide preparation with a 3-OST-l protein in the 

20 presence of a sulfate donor under conditions which permit the 3-OST-l to convert the 
GlcA-^GlcNS ±6S sequence to GlcA-»GlcNS 3S ±6S. In particular embodiments, the 
GlcA-*GlcNS ±6S sequence comprises a part of an HS** precursor sequence (i.e., IdoA-> 
GlcNAc 6S-K31cA-*GlcNS ±6S->IdoA 2S-»GlcNS 6S or IdoA->GlcNS 6S->GlcA->GlcNS 
±6S->IdoA 2S-»GlcNS 6S) or a part of an HS^ precursor sequence (i.e., IdoA-*GlcNAc-» 

25 GlcA-»GlcNS ±6S->IdoA 2S->GlcNS 6S; IdoA-^GlcNS-»GlcA->GlcNS ±6S->IdoA 
2S-KHcNS6S; IdoA-KHcNAc 6S->GlcA->GlcNS ±6S->IdoA 2S-»GlcNS; or IdoA-> 
GlcNS 6S->GlcA->GlcNS ±6S->IdoA 2S->GlcNS). Conversion of the HS** precursor pool to 
HS** increases the fraction with AT-binding activity and is particularly useful in the production of 
anticoagulant heparan sulfate products. Thus, in another embodiment, the present invention 

30 provides for means of enriching the AT-binding fraction of a heparan sulfate pool by contacting 
the polysaccharide preparation with 3-OST-l protein in the presence of a sulfate donor under 
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conditions which permit the 3-OST HS acl conversion activity. The 3-OST-l protein for use in 
these methods is selected from murine 3-OST-l, human 3-OST-l, mature murine 3-OST-l, 
mature human 3-OST-l, a functional fragment of a 3-OST-l having 3-O-sulfotransferase activity, 
a conservative substitution variant of 3-OST-l having 3-O-sulfotransferase activity, and a 
5 chimeric 3-OST-l having 3-O-sulfotransferase activity. In preferred embodiments, the sulfate 
donor is 3-phospho-adenosine 5-phosphosulfate (PAPS). 

Similarly, the present invention provides methods for 3-O-sulfating saccharide residues 
within a preparation of glycosaminoglycan or proteoglycan polysaccharides by contacting the 
preparation with at least a 3-O-sulfotransferase domain of a 3-OST protein in the presence of a 

1 0 sulfate donor under conditions which permit sulfation of the residues, and wherein the 3-OST 
protein is selected from 3-OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, ce3-OST and conservative 
substitution variants or chimeras thereof. In particular embodiments, the present invention 
provides methods for 3-O-sulfeting saccharide residues within a preparation of glycosaminoglycan 
or proteoglycan polysaccharides in which the polysaccharides include a polysaccharide sequence 

15 of GlcA 2S-*GlcNS. These methods comprise contacting the GlcA 2S->GlcNS-containing 
polysaccharide preparation with a 3-OST-2 protein in the presence of a sulfate donor under 
conditions which permit the 3-OST-2 protein to convert the GlcA 2S-»GlcNS sequence to GlcA 
2S-»GlcNS 3S. In particular embodiments, the GlcA 2S-KJlcNS sequence comprises a part of a 
GlcNS-KHcA 2S-»GlcNS sequence. In other particular embodiments, the present invention 

20 provides methods for 3-O-sulfeting saccharide residues within a preparation of glycosaminoglycan 
or proteoglycan polysaccharides in which the polysaccharides include a polysaccharide sequence 
of IdoA 2S->GlcNS. These methods comprise contacting the IdoA 2S-»GlcNS-containing 
polysaccharide preparation with a 3-OST-3 protein in the presence of a sulfate donor under 
conditions which permit the 3-OST-3 protein to convert the IdoA 2S -»GlcNS sequence to IdoA 

25 2S-»GlcNS 3S. In particular embodiments, the IdoA 2S^GlcNS sequence comprises a part of a 
GlcNS-»IdoA 2S-»GlcNS sequence. The 3-OST proteins for use in these methods are selected 
from 3-OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, ce3-OST, functional fragments of these 3-OSTs 
having 3-O-sulfotransferase activity, conservative substitution variants of these 3-OSTs having 3- 
O-sulfotransferase activity, and chimeric 3-OSTs having 3-O-sulfotransferase activity. In 

30 preferred embodiments, the sulfate donor is 3 , -phospho-adenosine S'-phosphosulfate (PAPS). 
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In another aspect, the present invention provides methods for partially sequencing 
complex polysaccharides such as heparan sulfates or other glycosaminoglycans (GAGs). In these 
methods, a pool of polysaccharides which includes sequences which may be 3-O-sulfated is 
contacted with a 3-OST protein in the presence of a sulfate donor (e.g., PAPS) under conditions 
which permit sulfation by the 3-OST. The treated polysaccharides are then subjected to 
degradation by enzymes which degrade polysaccharides in a sequence-specific manner (e.g., 
polysaccharide lyases; heparinase I, II or HI; heparitinase) and the size profile of the resulting 
fragments is determined. An identical pool which has not been treated with 3-OST is similarly 
cleaved by the same enzymes and a size profile determined. Changes in the size profiles indicate 
that 3-OST activity has modified the saccharide units so as to prevent (or permit) cleavage at sites 
which previously were (or were not) cleaved. Thus, comparison of the profiles will indicate 
positions at which the target sequences for 3-OST activity are present and provide a partial 
polysaccharide sequence. 

In another embodiment, the sequence of complex polysaccharides such as HS or GAGs 
may be partially determined using sequence specific polysaccharide affinity fractionation. To this 
end, 3-OST proteins which lack enzymatic function but retain sequence-specific HS or GAG 
binding capacity can be identified or produced (e.g., altering or deleting a portion of the catalytic 
ST domain by site-directed mutagenesis or deletion mutagenesis). These inactive forms will bind 
HS or GAGs in a sequence dependent manner and allow sequence-specific saccharide affinity 
fractionation from complex mixtures of GAGs. The purified structures may be degraded in a 
step-wise fashion with exolytic, endolytic enzymes and/or nitrous acid, and the resulting 
degradation products can be compared to standard compounds of known structure. This method 
will allow the quantitation and characterization of known structures contained within unknown 
complex polysaccharide samples. 

In another embodiment, partial sequence information can be obtained using the 3-OSTs of 
the invention or other heparan sulfate sequence specific binding ligands as protective groups prior 
to treating the HS or GAG with modifying agents that detectably alter the HS or GAG. Useful 
protective groups include catalytically inactive enzymes, chimeric enzymes and small molecule 
ligands with identified sequence binding specificities. The protecting group is contacted with the 
heparan or other glycosaminoglycans (GAGs), and the resultant complex is treated with one or 
more modifying agents. Useful modifying agents include catalytically active heparan lyases, 
sulfotransferases, N-deacetylases, N-acetyltransferases, epimerases, or chimeric proteins of the 
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invention. In embodiments where multiple protecting groups and/or modifying reagents are used 
in combination, the sample is first contacted with the protective group, then one or more 
modifying reagents may be with contacted with the protected polysaccharide, either 
simultaneously or in turn. The protective group(s) will interfere with the ability of a modifying 
agent to interact with, attach to and/or cleave specific GAG sequence motifs. The sample can 
then be analyzed for ligand-specific protection and/or cleavage to elucidate the sequence of the 
original GAG using separation and/or quantitation using methods known in the art. 

In another set of embodiments, the present invention provides isolated nucleic acids 
comprising a genetic regulatory sequences of a 3-OST gene operably joined to a marker gene. 
Such regulatory sequences include 5 1 untranslated regions such as promoter and operator 
sequences. The 5* regulatory sequences of the human 3-OST-4 gene (as well as coding regions) 
are disclosed herein as SEQ ID NO: 16. Such regulatory regions may be used to transform host 
cells, which are useful in methods of identifying compounds capable of modulating the expression 
of the 3-OST gene. Thus, in such methods, a candidate compound is contacted with a host cell 
transformed with a marker gene operably joined to the 3-OST regulatory regions, and changes in 
expression of the marker gene are indicative of the ability of the candidate compound to modulate 
3-OST expression. 

In another aspect, the present invention also provides methods for diagnosing individuals 
with disorders involving heparan sulfate biosynthesis comprising assaying such individuals for the 
presence of mutations in 3-OST genes/proteins. Such assays include nucleic acid based assays 
(employing the nucleic acids of the present invention), protein based assays (employing the 
antibodies of the present invention), and HS based assays employing the glycosaminoglycan 
sequencing methods of the present invention. 

These and other aspects of the present invention will be apparent to one of ordinary skill 
in the art from the following detailed description. 

Brief Description of the Drawings 
Fig. 1 is an alignment of the amino acid sequences of murine and human 3-OST-l proteins 
showing the high degree of homology. Vertical bars ( | ) between residues indicate identical 
residues. 

Fig. 2 is an alignment of the sulfotransferase domains of human NST-1, human NST-2, 
C. elegans 3-OST, human 3-OST-4, human 3-OST-3A, human 3-OST-2, and human 3-OST-l. 
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Fig. 3 is a schematic depiction of the structures of the 3-OST-l, 3-OST-2, 3-OST-3A, 
3-OST-3B and 3-OST^ proteins. 

Detailed Description of the Invention 

Definitions 

5 In order to more clearly and distinctly point out and describe the subject matter that 

applicants regard as the invention, the following definitions are provided for certain terms used in 
the following written description and the appended claims. 

Isolated nucleic acids . As used herein with respect to nucleic acids derived from naturally- 
occurring sequences, the term "isolated nucleic acid" means a ribonucleic or deoxyribonucleic 

10 acid which comprises a naturally-occurring nucleotide sequence and which is manipulable by 
standard recombinant DNA techniques, but which is not covalently joined to the nucleotide 
sequences that are immediately contiguous on its 5* and 3 1 ends in the naturally-occurring genome 
of the organism from which it is derived. As used herein with respect to synthetic nucleic acids, 
the term "isolated nucleic acid" means a ribonucleic or deoxyribonucleic acid which comprises a 

15 nucleotide sequence which does not occur in nature and which is manipulable by standard 

recombinant DNA techniques. An isolated nucleic acid is manipulable by standard recombinant 
DNA techniques when it may be used in, for example, amplification by polymerase chain reaction 
(PCR), in vitro translation, ligation to other nucleic acids (e.g., cloning or expression vectors), 
restriction from other nucleic acids (e.g., cloning or expression vectors), transformation of cells, 

20 hybridization screening assays, or the like. The term "isolated nucleic acids" is also intended to 
embrace synthetic oligonucleotides such as peptide nucleic acids (PNAs), nucleotides joined by 
phosphorothioate or other non-phosphodiester linkages, nucleic acids incorporating functionally 
equivalent nucleotide analogs, and the like. 

Transformation As used herein, means any method of introducing exogenous a nucleic 

25 acid into a cell including, but not limited to, transformation, transfection, electroporation, 

microinjection, direct injection of naked nucleic acid, particle-mediated delivery, viral-mediated 
transduction or any other means of delivering a nucleic acid into a host cell which results in 
transient or stable expression of said nucleic acid or integration of said nucleic acid into the 
genome of said host cell or descendant thereof. 

30 Substantially pure . As used herein with respect to protein preparations, the term 

"substantially pure" means a preparation which contains at least 60% (by dry weight) the protein 
of interest, exclusive of the weight of other intentionally included compounds. Preferably the 
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preparation is at least 75%, more preferably at least 90%, and most preferably at least 99%, by 
dry weight the protein of interest, exclusive of the weight of other intentionally included 
compounds. Purity can be measured by any appropriate method, e.g., column chromatography, 
gel electrophoresis, or HPLC analysis. If a preparation intentionally includes two or more 

5 different proteins of the invention, a "substantially pure" preparation means a preparation in which 
the total dry weight of the proteins of the invention is at least 60% of the total dry weight, 
exclusive of the weight of other intentionally included compounds. Preferably, for such 
preparations containing two or more proteins of the invention, the total weight of the proteins of 
the invention be at least 75%, more preferably at least 90%, and most preferably at least 99%, of 

10 the total dry weight of the preparation, exclusive of the weight of other intentionally included 
compounds. Thus, if the proteins of the invention are mixed with one or more other proteins 
(e.g., serum albumin, 6-OST) or compounds (e.g., diluents, detergents, excipients, salts, 
polysaccharides, sugars, lipids) for purposes of administration, stability, storage, and the like, the 
weight of such other proteins or compounds is ignored in the calculation of the purity of the 

15 preparation. 

Similarity . As used herein with respect to amino acid sequences, the "similarity" between 
two sequences means the percentage of amino acid residue positions, after aligning the sequences 
according to standard techniques, at which the two sequences have identical or similar residues. 
In general, "similar" residues include those which are regarded in the art as "conservative 

20 substitutions" (see, e.g., Dayhoff et al. (1978), Atlas of Protein Sequence and Structure Vol 5 
(Suppl. 3), pp. 354-352, Natl. Biomed. Res. Found., Washington, D.C.); which fall within the 
groups (a) methionine, leucine, isoleucine and valine, (b) phenylalanine, tyrosine and tryptophan, 
(c) lysine, arginine and histidine, (d) alanine and glycine, (e) serine and threonine, (f) ghitamine 
and asparagine, and (g) glutamate and aspartate; or which are otherwise shown to have no 

25 substantial effect on the biological activity of the protein. Numerical values for similarity were 
determined using the PileUp program. This program performed multiple sequence alignments 
based on methods of Feng and Doolittle (1987) J. Mol Evol 35: 351-360, and Higgins and Sharp 
(1998), CABIOS 5:151-153. Using these methods for each sequence alignment, the gap weight 
was set at 3.0 and the gap length was set at 0. 10. Percentages of similarity recited in the 

30 appended claims may be determined by these methods. 

Chimftrir. pmtein As used herein, the term "chimeric protein" means a protein having an 
amino acid sequence which is a positionally conserved combination of the amino acid sequences 
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of two or more other proteins. Thus, for a chimera of two or more reference proteins, the amino 
acid sequences of the reference proteins are aligned by standard techniques to identify residues 
which correspond at each position, allowing for relative insertions/deletions as necessary. Then, 
for each amino acid position of the chimeric protein, an amino acid residue is selected from the 

5 residues present at corresponding positions in the two or more reference proteins (allowing for no 
residue in the chimera when deletions are present amongst the reference proteins). The resultant 
chimera has an amino acid sequence which is a combination of the reference amino acid 
sequences, in which the relative position of each residue selected from the reference sequences is 
conserved within the chimera. 

10 Heparan sulfate . As used herein, the term "heparan sulfate" or the abbreviation "HS" 

means a polysaccharide of the form ([->4-D-GicApPl or -»4-L-IdoApal] -»4-D-GlcNp[Ac or 
S]al ^) n which is modified to a variable extent by sulfation of the 2-O-position of Glc and Ido 
residues, and the 6-0- and 3-0- positions of GlcN[Ac or S] residues. Therefore, this definition 
encompasses all glycosaminoglycan compounds referred to as heparan(s), heparan sulfate(s), 

15 heparin(s), heparin sulfate(s), heparitin(s), heparitin sulfate(s), heparanoid(s), heparosan(s). The 
heparan molecules may be pure glycosaminoglycans or can be linked to other molecules including 
other polymers such as proteins, and lipids, or small molecules such as biotin. 

The Heparan Sulfate D-Glucosaminvl 3-O-Sulfotransferases The present invention depends, in 
20 part, upon the identification and molecular cloning of cDNAs encoding mammalian heparan 

sulfate D-glucosaminyl 3 -O-sulf otransferases (3-OSTs). These proteins have been designated 3- 
OST-1, 3-OST-2, 3-OST-3A, 3-OST-3B, and 3-OST-4. In addition, a nematode 3-OST from 
C. elegans, ce3-OST, has been identified. 

3-OST- Is . Disclosed herein are the isolation and identification of murine and human 3- 
25 OST-1 cDNAs (SEQ ID NO: 1 and SEQ ID NO: 3, respectively). The coding regions of these 
cDNAs extend from, respectively, nucleotide positions 323-1255 of SEQ ID NO: 1 and positions 
1 19-1039 of SEQ ID NO : 3. The protein coding portions of the cDNAs are 85% identical and 
encode proteins of 3 1 1 and 307 amino acids (SEQ ID NO: 2 and SEQ ID NO: 4, respectively) 
which are 93% similar. The murine and human protein sequences are aligned in Figure 1. Each 
30 protein includes a twenty residue presumptive signal peptide (residues 1-20 of SEQ ID NO: 2 and 
SEQ ID NO: 4) which is cleaved off to form the mature form of these proteins. The mouse 3- 
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OST-1 contains an extra four residues (Ala^-Pro^-Gly^-Pro 27 ) not found in the human form. 
Each protein has five potential #-glycosylation sites (at residues 52-54, 141-143, 196-198, 246- 
248 and 253-255 of SEQ ID NO: 2, and residues 48-50, 137-139, 192-194, 242-244, 249-251 of 
SEQ ID NO: 4). JV-glycosylation of at least some of these sites appears important to 3-OST 

5 protein stability, specificity and/or activity. After the 3-OST-l signal peptide, there is a domain 
rich in the residues S, P, L, A, and G (SPLAG-rich domain) (residues 21-52 of SEQ ID NO: 2 
and residues 21-48 of SEQ ID NO: 4). 3-OST-l and all known NST species possess a 
homologous carboxy terminal sulfotransferase (ST) domain of -260 amino acids (residues 53-3 1 1 
of SEQ ID NO: 2 and residues 49-307 of SEQ ID NO: 4) that exhibits homology to all known 

10 sulfotransferases and which includes the minimal fragment necessary for sulfation activity. Figure 
2 shows a sequence alignment of the ST domains of the sulfotransferases NST-1 (SEQ ID NO: 
13), NST-2 (SEQ ID NO: 14), OST-1, OST-2, OST-3A/B, and OST-4. Within this region is a 
conserved sequence (at residues 260-269 of SEQ ID NO: 2, and 256-265 of SEQ ID NO: 4) 
which is a presumptive cysteine-bridged peptide loop thought to be involved in heparan sulfate 

15 substrate specificity. This cysteine-bridged peptide loop is part of the larger HS-binding domain 
(residues 250-276 of SEQ ID NO: 2 and 246-272 of SEQ ID NO: 4). A conserved lysine residue 
(residue 68 of SEQ ID NO: 2, and 64 of SEQ ID NO: 4) is presumptively catalytic. 

The 3-OST-l proteins have 3-O-sulfotransferase activity on polysaccharide sequences 
including the sequence GlcA— »GlcNS ±6S, and convert this polysaccharide sequence to the 

20 sequence to GlcA-»GlcNS 3S ±6S. Of particular importance, the 3-OST-l proteins are useful in 
converting HS"* precursor sequences (i.e., IdoA-»GlcNAc 6S-»GlcA-»GlcNS ±6S->IdoA 2S-> 
GlcNS 6S; or IdoA->GlcNS 6S->GlcA--KjicNS ±6S->IdoA 2S-> GlcNS 6S) to HS act . The 3- 
OST-1 proteins are highly expressed in endothelial cells, brain and kidney tissues, and to a lesser 
extent in heart, lung, skeletal muscle and placenta. The human 3-OST-l gene has been 

25 syntactically localized to chromosome 4, and more particularly to chromosome segment 4pl 5-16. 

3-OST-2s . Also disclosed herein are the isolation and identification of a human 3-OST-2 
cDNA (SEQ ID NO: 5). The coding region of this cDNA extends from nucleotide positions 73- 
1 173 of SEQ ID NO: 5. The cDNA encodes a protein of 367 amino acids (SEQ ID NO: 6). The 
protein has four potential JV-glycosylation sites (at residues 102-104, 193-195, 235-237 and 306- 

30 308 of SEQ ID NO: 6). W-glycosylation of at least some of these sites appears important to 3- 
OST protein stability, specificity and/or activity. The 3 -OST-2 protein has a putative N-terminal 
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cytoplasmic domain (residues 1-19 of SEQ ID NO: 6), followed by a putative transmembrane 
domain (residues 20-41 of SEQ ID NO: 6), followed by a SPLAG-rich domain (residues 42-109 
of SEQ ID NO: 6). This is followed by the characteristic carboxy terminal ST domain of -260 
amino acids (residues 1 10-367 of SEQ ID NO: 6) that exhibits homology to all known 

5 sulfotransferases and which includes the minimal fragment necessary for sulfation activity. Within 
this region is a conserved sequence (at residues 313-325 of SEQ ID NO: 6) which is a 
presumptive cysteine-bridged peptide loop thought to be involved in heparan sulfate substrate 
specificity. This cysteine-bridged peptide loop is part Of the larger HS-binding domain (residues 
303-332 of SEQ ID NO: 6). A conserved lysine residue (residue 24 of SEQ ID NO: 6) is 

10 presumptively catalytic. A cDNA of an allelic variant has also been identified, which includes four 
silent nucleotide substitutions (G-^A at bp 804, T->G at bp 1249, T->C at bp 1350, and C-*T at 
bp 1507 of SEQ ID NO: 5) which do not affect the encoded protein. 

The 3-OST-2 proteins have 3-O-sulfotransferase activity on polysaccharide sequences 
including the sequences GlcA 2S-^GlcNS or GlcNS-*GlcA 2S-»GlcNS, and convert these 

15 polysaccharide sequences to GlcA 2S->GlcNS 3S or GlcNS-»GlcA 2S-»GlcNS 3S, respectively. 
The 3-OST-2 proteins are not expressed in endothelial cells, but are highly expressed in brain 
tissues, and to a lesser extent in heart, lung, skeletal muscle and placenta. The human 3-OST-2 
gene has been localized to chromosome 16, and more particularly to chromosome segment 
16pl2.3. 

20 3-OST-3As. Also disclosed herein are the isolation and identification of a human 3-OST- 

3A cDNA (SEQ ID NO: 7). The coding region of this cDNA extends from nucleotide positions 
799-2016 of SEQ ID NO: 7. The cDNA encodes a protein of 406 amino acids (SEQ ID NO: 8). 
The protein has two potential iV-glycosylation sites (at residues 273-275 and 344-346 of SEQ ID 
NO: 8). N-glycosylation of one or more of these sites appears important to 3-OST protein 

25 stability, specificity and/or activity. The 3-OST-3A protein has a putative N-terminal cytoplasmic 
domain (residues 1-24 of SEQ ID NO: 8), followed by a putative transmembrane domain 
(residues 25-43 of SEQ ID NO: 8), followed by a SPLAG-rich domain (residues 44-147 of SEQ 
ID NO: 8). This is followed by the characteristic carboxy terminal ST domain of -260 amino 
acids (residues 148-406 of SEQ ID NO: 8) that exhibits homology to all known sulfotransferases 

30 and which includes the minimal fragment necessary for sulfation activity. Within this region is a 
conserved sequence (at residues 351-363 of SEQ ID NO: 8) which is a presumptive cysteine- 
bridged peptide loop thought to be involved in heparan sulfate substrate specificity. This 
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cysteine-bridged peptide loop is part of the larger HS-binding domain (residues 341-370 of SEQ 
ID NO: 8). A conserved lysine residue (residue 162 of SEQ ID NO: 8) is presumptively catalytic. 

The 3-OST-3A proteins have 3-O-sulfotransferase activity on polysaccharide sequences 
including the sequences IdoA 2S-*GlcNS or GlcNS-»IdoA 2S-»GlcNS, and convert these 
5 polysaccharide sequences to IdoA 2S->GlcNS 3 S or GlcNS-»IdoA 2S->GlcNS 3S, respectively. 
The 3-OST-3A proteins are not expressed in endothelial cells, but are highly expressed in kidney, 
placenta and liver tissues, and to a lesser extent in brain, heart, lung, and skeletal muscle. 

3-OST-3Bs . Also disclosed herein are the isolation and identification of a human 3-OST- 
3B cDNA (SEQ ID NO: 9). The coding region of this cDNA extends from nucleotide positions 

10 331-1500 of SEQ ID NO: 9. The cDNA encodes a protein of 390 amino acids (SEQ ID NO: 10). 
The protein has two potential N-glycosylation sites (at residues 258-260 and 329-331 of SEQ ID 
NO: 10). 7V-glycosylation of one or more of these sites appears important to 3-OST protein 
stability, specificity and/or activity. The 3-OST-3B protein has a putative N-terminal cytoplasmic 
domain (residues 1-32 of SEQ ID NO: 10), followed by a putative transmembrane domain 

15 (residues 33-65 of SEQ ID NO: 10), followed by a SPLAG-rich domain (residues 66-132 of SEQ 
ED NO: 10). This is followed by the characteristic carboxy terminal ST domain of -260 amino 
acids (residues 133-390 of SEQ ID NO: 10) that exhibits homology to all known sulfotransferases 
and which includes the minimal fragment necessary for sulfation activity. Within this region is a 
conserved sequence (at residues 336-348 of SEQ ID NO: 10) which is a presumptive cysteine- 

20 bridged peptide loop thought to be involved in heparan sulfate substrate specificity. This 

cysteine-bridged peptide loop is part of the larger HS-binding domain (residues 326-355 of SEQ 
ID NO: 10). A conserved lysine residue (residue 147 of SEQ ID NO: 10) is presumptively 
catalytic. 

The 3-OST-3B proteins have 3-O-sulfotransferase activity on polysaccharide sequences 
25 including the sequences IdoA 2S->GlcNS or GlcNS-»IdoA 2S->GlcNS, and convert these 

polysaccharide sequences to IdoA 2S->GlcNS 3S or GlcNS->IdoA 2S-*GlcNS 3S, respectively. 
The 3-OST-3 A proteins are not expressed in endothelial cells, but are highly expressed in kidney, 
placenta and liver tissues, and to a lesser extent in brain, heart, lung, and skeletal muscle. 

3-OST-4s . Also disclosed herein are the isolation and identification of a human 3-OST-4 
30 nucleic acid sequence (SEQ ID NO: 1 1). This sequence represents is a possible or predicted 
heteronuclear RNA species, and is a composite of 5' genomic sequences information and an 
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overlapping partial cDNA The coding region of this sequence extends from nucleotide positions 
847-2214 of SEQ ID NO: 1 1, and encodes a protein of 456 amino acids (SEQ ED NO: 12). The 
protein has two potential tf-glycosylation sites (at residues 3 18-320 and 389-391 of SEQ ID NO: 
12). N-glycosylation of one or more of these sites appears important to 3-OST protein stability, 

5 specificity and/or activity. The 3-OST-4 includes the characteristic carboxy terminal ST domain 
of -260 residues (residues 207-456 of SEQ ID NO: 12) that exhibits homology to all known 
sulfotransferases and which includes the minimal fragment necessary for sulfation activity. Within 
this region is a conserved sequence (at residues 396-408 of SEQ ID NO: 12) which is a 
presumptive cysteine-bridged peptide loop thought to be positioned near the active site. This 

10 cysteine-bridged peptide loop is part of the larger HS-binding domain (residues 386-415 of SEQ 
ID NO: 12). A conserved lysine residue (residue 207 of SEQ ID NO: 12) is presumptively 
catalytic. 

The 3-OST-4 proteins have sulfotransferase activity, but the sequence specificity of this 
activity has not yet been determined. The 3-OST-4 proteins appear to be expressed at detectable 

15 levels only in the brain. The human 3-OST-4 gene has been localized to chromosome 16, and 
more particularly to chromosome segment 16pl 1. 

C elegans 3-OSTs . Also disclosed herein is the identification of a C. elegans homologue 
of the human 3-OSTs, ce3-OST. This protein is disclosed as SEQ ID NO: 15, and includes the 
characteristic carboxy terminal ST domain of -260 residues (residues 23-291 of SEQ ID NO: 15) 

20 that exhibits homology to all known sulfotransferases and which includes the minimal fragment 
necessary for sulfation activity. Within this region is a conserved sequence (at residues 240-250 
of SEQ ID NO: 15) which is a presumptive cysteine-bridged peptide loop thought to be 
positioned near the active she. This cysteine-bridged peptide loop is part of the larger HS-binding 
domain (residues 230-257 of SEQ ID NO: 1 5). A conserved lysine residue (residue 38 of SEQ 

25 ID NO: 15) is presumptively catalytic. 

The C elegans 3-OST proteins have sulfotransferase activity, but the sequence specificity 
of this activity has not yet been determined. BLAST and Genefinder analysis of genomic cosmids 
predicts that ce3-OST is an intraluminal resident protein of 291 residues encoded by 4 exons 
(clone F52B10, GBan U41990; residues 26317-26090, 21886-21732, 21682-21395, and 21345- 

30 21140). 
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The homology between the sulfotransferase domain of the ce3-OST and the human 3-OST 
and NST proteins is illustrated in Fig. 2. Based on this sequence alignment, one may also produce 
chimeric proteins between and the C elegans protein and its human homologues. 
Isolated Nucleic Acids 

5 In one aspect, the present invention provides isolated nucleic acids encoding 3-OST 

proteins or functional fragments thereof. In preferred embodiments, the 3-OST proteins are 3- 
OST-1 proteins, 3-OST-2 proteins, 3-OST-3A proteins, 3-OST-3B proteins, 3-OST-4 proteins, 
or ce3-OST proteins. In particularly preferred embodiments, the 3-OST proteins are those 
disclosed as SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, 

10 SEQ ID NO: 12, or SEQ ID NO: 15. As shown in the examples below, the isolated nucleic acids 
encoding all or a portion of one mammalian 3-OST protein may be used to isolate homologues in 
other species by standard techniques known to those of ordinary skill in the art. Thus, the present 
invention also enables isolated nucleic acids encoding the 3-OST proteins of other mammalian 
species including, for example, rats, goats, sheep, cows, pigs, and non-human primates. Similarly, 

15 the isolated nucleic acids disclosed herein may be used to screen additional human or other 
mammalian genetic libraries (e.g., genomic or cDNA libraries) to identify allelic variants of the 
particularly disclosed sequences. Thus, the present invention also enables isolated nucleic acids 
encoding human and other mammalian 3-OST allelic variants. 

In another aspect, the present invention provides isolated nucleic acids encoding functional 

20 fragments of 3-OST proteins, 3-OST protein variants in which conservative substitutions have 
been made for certain residues, or encoding chimeric 3-OST proteins in which the sequences of 
two or more 3-OST proteins have been mixed, to produce non-naturally occurring variants which 
retain sequence-specific HS binding affinity and/or 3-O-sulfotransferase activity. The preferred 
amino acid sequences of such variants are described below. 

25 In preferred embodiments, the isolated nucleic acids encoding a mammalian 3-OST or 

functional fragment thereof have at least 60%, preferably at least 70%, and more preferably at 
least 80% nucleotide sequence identity to the coding regions of the mammalian 3-OST sequences 
particularly disclosed herein (SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, 
SEQ ID NO: 9 and SEQ ED NO: 1 1), and encode at least a functional fragment having sequence- 

30 specific HS binding affinity and/or 3-O-sulfotransferase activity. Most preferably, the sequences 
have at least 90% or 95% nucleotide sequence identity to the disclosed reference sequences. 
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As will be apparent to one of ordinary skill in the art, the degeneracy of the genetic code 
allows for numerous nucleotide substitutions in a given coding sequence which do not affect the 
amino acid sequence of the encoded protein. Thus, the present invention also provides for 
isolated nucleic acids which differ from any of the above-described sequences only by the 

5 substitution of such synonymous codons. 

The isolated nucleic acids of the present invention may be joined to other nucleic acid 
sequences for use in various applications. Thus, for example, the isolated nucleic acids of the 
invention may be ligated into cloning or expression vectors, as are commonly known in the art 
and as described in the examples below. In addition, the nucleic acids of the invention may be 

10 joined in-frame to sequences encoding another polypeptide so as to form a fusion protein, as is 
commonly known in the art and as described in the examples below. Thus, in certain 
embodiments, the present invention provides cloning, expression and fusion vectors comprising 
any of the above-described nucleic acids. 

In another aspect, the isolated nucleic acids of the present invention may comprise only a 

15 portion of a nucleotide sequence encoding a complete mammalian 3-OST protein. For example, 
and as described more fully below, the 3-OST-l proteins comprise a signal sequence which is 
removed post-translationally to yield the mature proteins. In some instances (e.g., when 
translating 3-OST-l proteins in vitro), it may be preferable to employ an isolated nucleic acid 
which encodes only the mature protein. In addition, the four C-terminal residues of 3-OST-l are 

20 believed to be involved in localization of the protein within the Golgj apparatus. In some 

instances (e.g., when encoding 3-OST-l proteins for use in vitro), it may be preferable to employ 
an isolated nucleic acid which does not encode these residues, as they will be unnecessary for in 
vitro function. As described above, an approximately 260 residue portion of the 3-OST proteins 
includes the catalytically active region (ST domain) and, therefore, it may be preferable to employ 

25 an isolated nucleic acid which encodes only this functional fragment which retains 3-0- 

sulfotransferase activity. Thus, in certain preferred embodiments, the present invention provides 
isolated nucleic acid sequences encoding mature forms of a mammalian 3-OST-l protein, C- 
terminally truncated forms of the 3-OST proteins, or minimal functional fragments of the 3-OST 
proteins. In addition, as described above, these sequences may also encode conservative 

30 substitution variants or chimeras of 3-OST proteins, and may include synonymous codon 
substitutions. 



WO 99/22005 



PCT/US98/22597 



-18- 

In another aspect, the present invention provides for nucleic acids which comprise a 
sequence of at least 16-18, preferably 1 8-20 consecutive nucleotides from any one of SEQ ID 
NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9 and SEQ ID NO: 1 1. 
Such nucleic acid sequences have utility for determining the levels of expression of 3-OST 

5 transcripts in cells or tissues, for identifying tissues in which the 3-OST genes are differentially 
expressed (see above), for encoding peptide fragments which may be used to raise antibodies to 
corresponding regions of the 3-OST proteins, identifying chromosomes bearing the corresponding 
3-OST sequences (see above), for priming polymerase chain reaction amplification of 3-OST 
sequences (e.g., prior to in vitro translation, see below), and for various other utilities which will 

10 be apparent to those skilled in the art. Particularly preferred sequences for PCR amplification 

include those which are 5' to and/or include the initiation codon, which are 5* to and/or include the 
codons encoding the signal peptide cleavage site, or which are 3' to and/or include the termination 
codon. Sequences useful for encoding peptide fragments include those which are located within 
the coding region. 

15 Cell Lines and Transgenic Animals 

The present invention also provides for cells or cell lines, both prokaryotic and eukaryotic, 
into which have been introduced the nucleic acids of the present invention so as to cause clonal 
propagation of those nucleic acids and/or expression of the proteins or peptides encoded thereby. 
Such cells or cell lines have utility in the propagation and production of the nucleic acids of the 

20 invention, as well as the production of the proteins of the present invention. As used herein, the 
term "transformed cell" is intended to embrace any cell, or the descendant of any cell, into which 
has been introduced any of the nucleic acids of the invention, whether by transformation, 
transfection, transduction, infection, or other means. Methods of producing appropriate vectors, 
transforming cells with those vectors, and identifying transformants are well known in the art and 

25 are only briefly reviewed here (see, for example, Sambrook et al. (1989) Molecular Cloning- A 
Laboratory Manual. 2nd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New 
York). 

Prokaryotic cells useful for producing the transformed cells of the invention include 
members of the bacterial genera Escherichia (e.g., E. colt), Pseudomonas (e.g., P. aeruginosa), 
30 and Bacillus (e.g., B. subtillus, B. stearoihermophilus\ as well as many others well known and 
frequently used in the art. Prokaryotic cells are particularly useful for the production of large 
quantities of the proteins or peptides of the invention (e.g., naturally occurring or synthetic 3- 
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OSTs, fragments of the 3-OSTs, fusion proteins of the 3-OSTs). Bacterial ceUs (e.g., E. coli) 
may be used with a variety of expression vector systems including, for example, plasmids with the 
T7 RNA polymerase/promoter system, bacteriophage X regulatory sequences, or Ml 3 Phage 
regulatory elements. Bacterial hosts may also be transformed with fusion protein vectors which 
5 create, for example, Protein A, lacZ, trpE, maltose-binding protein, poly-His tag, or glutathione- 
s-transferase fusion proteins. All of these, as well as many other prokaryotic expression systems, 
are well known in the art and widely available commercially (e.g., pGEX-27 (Amrad, USA) for 
GST fusions). 

Eukaryotic cells and cell lines useful for producing the transformed cells of the invention 

10 include mammalian cells (e.g., endothelial cells, mast cells, COS cells, CHO cells, fibroblasts, 
hybridomas, oocytes, embryonic stem cells), insect cells lines (e.g., Drosophila Schneider cells), 
yeast, and fungi. Eukaryotic cells are particularly useful for embodiments in which it is necessary 
that the 3-OST proteins, or functional fragments thereof, be properly post-translationally modified 
(e.g., ^-glycosylated) because #-glycosylation of these proteins appears to be important to their 

15 stability and/or activity. Currently preferred cells are mammalian cells and, in particular, COS-7 
cells, CHO, cells, murine primary cardiac microvascular endothelial cells (CME), murine mast cell 
line C57.1 , human primary endothelial cells of umbilical vein (HUVEC), F9 embryonal carcinoma 
cells, rat fat pad endothelial cells (RFPEC), L cells (e.g., murine LTA tk' cells), and cells derived 
from the transgenic animals of the invention. 

20 To accomplish expression in eukaryotic cells, a wide variety of vectors have been 

developed and are commercially available which allow inducible (e.g., LacSwitch expression 
vectors, Stratagene, La Jolla, CA) or constitutive (e.g., pcDNAJ vectors, Invitrogen, Chatsworth, 
CA) expression of 3-OST nucleotide sequences under the regulation of an artificial promoter 
element. Such promoter elements are often derived from CMV or S V40 viral genes, although 

25 other strong promoter elements which are active in eukaryotic cells can also be employed to 
induce transcription of 3-OST nucleotide sequences. Typically, these vectors also contain an 
artificial polyadenylation sequence and 3' UTR which can also be derived from exogenous viral 
gene sequences or from other eukaryotic genes. These expression systems are commonly 
available from commercial sources and are typified by vectors such as pcDNA3 and pZeoSV 

30 (Invitrogen, San Diego, CA). As described below, the vector pcDNA3 has been successfully used 
to cause expression of 3-OST- 1 proteins in transfected COS-7 cells. Numerous expression 
vectors are available from commercial sources to allow expression of any desired 3-OST 
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transcript in more or less any desired cell type, either constitutively or after exposure to a certain 
exogenous stimulus (e.g., withdrawal of tetracycline or exposure to IPTG). 

Vectors may be introduced into the recipient or "host 11 cells by various methods well 
known in the art including, but not limited to, calcium phosphate transfection, strontium 

5 phosphate transfection, DEAE dextran transfection, electroporation, lipofection, microinjection, 
ballistic insertion on micro-beads, protoplast fusion or, for viral or phage vectors, by infection 
with the recombinant virus or phage. 
Transgenic Animal Models 

The present invention also provides for the production of transgenic non-human 

10 animal models in which wild type, allelic variant, chimeric, or antisense 3-OST sequences are 
expressed, or in which 3-OST sequences have been inactivated or deleted (e.g., "knock-out" 
constructs) or replaced with reporter or marker genes (e.g., "knock-in reporter construct"). The 
3-OST sequences may be conspecific to the transgenic animal (e.g., murine sequences in a 
transgenic mouse) or transpecific to the transgenic animal (e.g. human sequence in a transgenic 

15 mouse). In such a transgenic animal, the transgenic sequences may be expressed inducibly, 
constitutively or ectopically. Expression may be tissue-specific or organism-wide. Engineered 
expression of 3-OST sequences in tissues and cells not normally containing 3-OST gene products 
may cause novel alterations of heparan polysaccharide structure and lead to novel cell or tissue 
phenotypes. Ectopic or altered levels of expression of 3-OST sequences may alter cell, tissue 

20 and/or developmental phenotypes. Transgenic animals are useful as models of thromboembolic 
and other disorders arising from defects in heparan sulfate biosynthesis or metabolism. 
Transgenic animals are also useful for screening compounds for their effects on HS biosynthesis 
mediated by 3-OSTs. Transgenic animals transformed with reporter constructs may be used to 
measure the transcriptional effects of small molecules, drugs, protein physiological mediators, 

25 carbohydrate effectors, mimetic compounds or physical perturbations on the expression of 3-OST 
loci in vivo. The transgenic animals of the invention, may be used to screen such compounds for 
therapeutic utility. 

Animal species suitable for use in the animal models of the present invention include, 
but are not limited to, rats, mice, hamsters, guinea pigs, rabbits, dogs, cats, goats, sheep, pigs, and 
30 non-human primates (e.g., Rhesus monkeys, chimpanzees). For initial studies, transgenic rodents 
(e.g., mice) are preferred due to their relative ease of maintenance and shorter life spans. 
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Transgenic non-human primates may be preferred for longer term studies due to their greater 
similarity to humans and their higher cognitive abilities. 

Using the a nucleic acid disclosed and otherwise enabled herein, there are now several 
available approaches for the creation of a transgenic animal. Thus, the enabled animal models 

5 include: (1) animals in which sequences encoding at least a functional fragment of a wild type 3- 
OST gene has been recombinantly introduced into the genome of the animal as an additional gene, 
under the regulation of either an exogenous or an endogenous promoter element, and as either a 
minigene (i.e., a genetic construct of the 3-OST with the introns, if any, removed) or a large 
genomic fragment; (2) animals in which sequences encoding at least a functional fragment of a 

10 normal 3-OST gene have been recombinantly substituted for one or both copies of the animal's 
homologous 3-OST gene by homologous recombination or gene targeting; (3) animals in which 
one or both copies of one of the animal's homologous 3-OST genes have been recombinantly 
"humanized" by the partial substitution of sequences encoding the human homologue by 
homologous recombination or gene targeting; (4) animals in which sequences encoding 3-OST 

15 transcriptional elements linked to a reporter gene have replaced the endogenous 3-OST gene and 
transcriptional elements; (5) "knock-out" animals in which one or both copies of the animal's 3- 
OST sequences have been partially or completely deleted or have been inactivated by the insertion 
or substitution by homologous recombination or gene targeting of exogenous sequences (e.g., 
stop codons,); (6) animals in which additional genes related to the biosynthesis or metabolism of 

20 heparan sulfates have been altered (e.g., a murine transgenic in which all of the genes in the HS 
pathway have been humanized). These and other transgenic animals of the invention are useful as 
models of thromboembolic and other disorders arising from defects in heparan sulfate biosynthesis 
or metabolism. These animals are also useful for screening compounds for their effects on HS 
biosynthesis mediated by 3-OSTs. 

25 To produce an animal model (e.g., a transgenic mouse), a wild type or allelic variant 3- 

OST sequence or a wild type or allelic variant of a recombinant nucleic acid encoding at least a 
functional fragment of a 3-OST is preferably inserted into a germ line or stem cell using standard 
techniques of oocyte or embryonic stem cell microinjection, or other form of transformation of 
such cells. Alternatively, other cells from adult organism may be employed. Animals produced by 

30 these or similar processes are referred to as transgenic. Similarly, if it is desired to inactivate or 
replace an endogenous 3-OST sequence, homologous recombination using oocytes, embryonic 
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stem or other cells may be employed. Animals produced by these or similar processes are referred 
to as "knock-out" (inactivation) or "knock-in" (replacement) models. 

For oocyte injection, one or more copies of the recombinant DNA constructs of the 
present invention may be inserted into the pronucleus of a just-fertilized oocyte. This oocyte is 

5 then reimplanted into a pseudo-pregnant foster mother. The liveborn animals are screened for 
integrants using analysis of DNA (e.g., from the tail veins of offspring mice) for the presence of 
the inserted recombinant transgene sequences. The transgene may be either a complete genomic 
sequence introduced into a host as a YAC, BAC or other chromosome DNA fragment, a cDNA 
with either the natural promoter or a heterologous promoter, or a minigene containing all of the 

10 coding region and other elements found to be necessary for optimum expression. 

To create a transgene, the target sequence of interest (e.g., wild type or allelic variant 
3-OST sequences) are typically ligated into a cloning site located downstream of some promoter 
element which will regulate the expression of RNA from the sequence. Downstream of the 
coding sequence, there is typically an artificial polyadenylation sequence. An alternative approach 

15 to creating a transgene is to use an exogenous promoter and regulatory sequences to drive 

expression of the transgene. Finally, it is possible to create transgenes using large genomic DNA 
fragments such as YACs which contain the entire desired gene as well as its appropriate 
regulatory sequences. 

Animal models may be created by targeting endogenous 3-OST sequence in order to 

20 alter the endogenous sequence by homologous recombination. These targeting events can have 
the effect of removing endogenous sequence (knock-out) or altering the endogenous sequence to 
create an amino acid change associated with human disease or an otherwise abnormal sequence 
(e.g., a sequence which is more like the human sequence than the original animal sequence) 
(knock-in animal models). A large number of vectors are available to accomplish this and 

25 appropriate sources of genomic DNA for mouse and other animal genomes to be targeted are 

commercially available from companies such as GenomeSystems Inc. (St. Louis, Missouri, USA). 
The typical feature of these targeting vector constructs is that 2 to 4 kb of genomic DNA is 
ligated 5' to a selectable marker (e.g., a bacterial neomycin resistance gene under its own 
promoter element termed a "neomycin cassette"). A second DNA fragment from the gene of 

30 interest is then ligated downstream of the neomycin cassette but upstream of a second selectable 
marker (e.g., thymidine kinase). The DNA fragments are chosen such that mutant sequences can 
be introduced into the germ line of the targeted animal by homologous replacement of the 
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endogenous sequences by either one of the sequences included in the vector. Alternatively, the 
sequences can be chosen to cause deletion of sequences that would normally reside between the 
left and right arms of the vector surrounding the neomycin cassette. The former is known as a 
knock-in, the latter is known as a knock-out. 

5 Retroviral infection of early embryos can also be done to insert the recombinant DNA 

constructs of the invention. In this method, the transgene (e.g., a wild type or allelic variant 3- 
OST sequence) is inserted into a retroviral vector which is used to directly infect embryos (e.g., 
mouse or non-human primate embryos) during the early stages of development to generate 
partially transgenic animals, some of which bear the transgenes in germline cells. 

10 Alternatively, homologous recombination using a population of stem cells allows for 

the screening of the population for successful transformants. Once identified, these can be 
injected into blastocysts, and a proportion of the resulting animals will show germline 
transmission of the transgene. 

Techniques of generating transgenic animals, as well as techniques for homologous 

15 recombination or gene targeting, are now widely accepted and practiced. A laboratory manual on 
the manipulation of the mouse embryo, for example, is available detailing standard laboratory 
techniques for the production of transgenic mice (69). 

Finally, equivalents of transgenic animals, including animals with mutated or inactivated 3- 
OST sequences may be produced using chemical or x-ray mutagenesis of gametes, followed by 

20 fertilization. Using the isolated a nucleic acid disclosed or otherwise enabled herein, one of 

ordinary skill may more rapidly screen the resulting offspring by, for example, direct sequencing, 
SSCP, RFLP, PCR, or hybridization analysis to detect mutants, or Southern blotting to 
demonstrate loss of one allele by dosage. 
Identifying Modulators of 3-OST Expression 

25 In another set of embodiments, the present invention provides isolated nucleic acids 

comprising a genetic regulatory sequences of a 3-OST gene operably joined to a marker gene. 
Such regulatory sequences include 5' untranslated regions such as promoter and operator 
sequences. The 5' regulatory sequences of the human 3-OST-4 gene (as well as coding regions) 
are disclosed herein as SEQ ED NO: 16. Such regulatory regions may be used to transform host 

30 cells, which are useful in methods of identifying compounds capable of modulating the expression 
of the 3-OST gene. Thus, in such methods, a candidate compound is contacted with a host cell 
transformed with a marker gene operably joined to the 3-OST regulatory regions, and changes in 
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expression of the marker gene are indicative of the ability of the candidate compound to modulate 
3-OST expression. Such methods may also be performed using the transgenic animals of the 
inventioa 

Substantially Pure Proteins 
5 In one aspect, the present invention provides substantially pure preparations of 3-OST 

proteins. In preferred embodiments, the 3-OST proteins are 3-OST-l, 3-OST-2, 3-OST-3A, 3- 
OST-3B, 3-OST-4 or ce3-OST proteins. In particularly preferred embodiments, the 3-OST 
proteins are those disclosed as SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, 
SEQ ID NO: 10, SEQ ID NO: 12, or SEQ ID NO 15. As shown in the examples below, nucleic 

10 acids encoding all or a portion of one mammalian 3-OST protein may be used to isolate 

homologues in other species by standard techniques known to those of ordinary skill in the art. 
Thus, the present invention also enables substantially pure protein preparations of 3-OST proteins 
of other mammalian species including, for example, rats, goats, sheep, cows, pigs, and non-human 
primates. Similarly, the isolated nucleic acids disclosed herein may be used to screen additional 

15 human or other mammalian genetic libraries (e.g., genomic or cDNA libraries) to identify allelic 
variants of the particularly disclosed sequences. Thus, the present invention also enables 
substantially pure protein preparations of human and other mammalian 3-OST allelic variants. 

In another aspect, the present invention provides 3-OST protein variants in which 
conservative substitutions have been made for certain residues, or chimeric 3-OST proteins in 

20 which the sequences of various 3-OST proteins have been mixed, to produce non-naturally 
occurring variants which retain 3 -O-sulfo transferase activity. Conservative substitutions are 
preferably made in those regions of the proteins which are already known to vary amongst the 
human and murine sequences (see Figure 1) or between the 3-OST-l, 3-OST-2, 3-OST-3A, 3- 
OST-3B 3-OST-4, and ce3-OST proteins (see, e.g., Figure 2). Substitutions are to be avoided in 

25 those areas which have been implicated in catalysis (see above). Chimeric 3-OST proteins may be 
made using the disclosed sequences as reference sequences, and these chimeras may also be 
subjected to conservative substitutions as described above. In addition, based upon the 
homologies of the 3-OST proteins to other glucosaminyl sulfotransferases (e.g., 2-OST, NST-1, 
NST-2), one of ordinary skill in the art may produce chimeric 3-OSTs using those proteins as 

30 reference sequences (see, e.g., Figure 2). 

In preferred embodiments, the 3-OST proteins have at least 60%, , preferably at least 
70%, and more preferably at least 80% amino acid sequence similarity to the mammalian 3-OST 
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sequences particularly disclosed herein, and retain 3-O-sulfotransferase activity. Most preferably, 
the sequences have at least 90% or 95% amino acid sequence similarity to the disclosed reference 
sequences. Such sequences may be routinely produced by those of ordinary skill in the art, and 3- 
O-sulfotransferase activity may be tested by routine methods such as those disclosed herein. 

5 The substantially pure proteins of the present invention may be joined to other polypeptide 

sequences for use in various applications. Thus, for example, the proteins of the invention may be 
joined to one or more additional polypeptides so as to form a fusion protein, as is commonly 
known in the art and as described in the examples below. The additional polypeptides may be 
joined to the N-terminus, C-terminus or both termini of the 3-OST protein. Such fusion proteins 

10 may be particularly useful if the additional polypeptide sequences are easily identified (e.g., by 
providing an antigenic determinant) or easily purified (e.g., by providing a ligand for affinity 
purification). 

In another aspect, the substantially pure 3-OST proteins of the present invention may 
comprise only a portion or fragment of the amino acid sequence of a complete mammalian 3-OST 

15 protein. For example, as described above, the 3-OST-l proteins comprise a twenty amino acid 
signal sequence which is removed post-translationally to yield the mature proteins. In some 
instances (e.g., when employing 3-OST-l proteins in vitro), it may be preferable to employ only 
the mature protein or a minimal fragment retaining 3-O-sulfotransferase activity. In addition, the 
four C-terminal residues of 3-OST-l may be involved in localization of the protein within the 

20 Golgi apparatus. In some instances (e.g., when employing 3-OST-l proteins in vitro\ it may be 
preferable to employ a 3-OST-l protein which does not include these residues, as they will be 
unnecessary for in vitro function. As described above, an approximately 260 amino acid portion 
of the 3-OST proteins includes the catalytically active region and, therefore, it may be preferable 
to employ a 3-OST protein which includes only this functional fragment which retains 3-0- 

25 sulfotransferase activity. Thus, in certain preferred embodiments, the present invention provides 
substantially pure 3-OST proteins including mature forms of a mammalian 3-OST-l protein, C- 
terminally truncated forms, or minimal functional fragments thereof. In addition, as described 
above, these proteins may also comprise conservative substitution variants or chimeras of 3-OST 
proteins. 

30 In another aspect, the present invention provides for substantially pure protein 

preparations which comprise a sequence of at least 6-12, preferably 10-16, more preferably 16-22 
consecutive amino acid residues from any one of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, 
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SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 12, and SEQ ID NO: 15. Such polypeptides have 
utility to raise antibodies to corresponding regions of the 3-OST proteins. In particular, an 
analysis of the amino acid sequences of the 3-OST proteins suggests that there are regions which 
will have particular utility in generating antibodies. Thus, in preferred embodiments, the 

5 inventions provides antigenic 3-OST polypeptides selected from the group consisting of (a) 
residues 4-29, 144-152, 208-222, 31-42, 155-181, 72-94, 195-205, 278-293, 113-136, 56-66, 
230-245, 257-263, 301-306, 267-272 and 101-107 of SEQ ID NO: 2; (b) residues 4-22, MO- 
MS, 205-218, 68-90, 191-201, 274-289, 110-133, 51-62, 226-241, 253-259, 151-163, 168-181, 
297-302, 27-34, 97-107 and 263-268 of SEQ ID NO: 4; (c) residues 18-44, 199-207, 114-123, 

10 319-328, 250-275, 238-246, 128-143, 47-59, 83-98, 332-349, 178-186, 289-295, 310-316, 63- 
76, 4-9, 209-218, 170-176 and 300-305 of SEQ ID NO: 6; (d) residues 22-57, 236-256, 166- 
186, 151-161, 138-147, 77-85, 348-354, 87-94, 323-335, 360-366, 284-314, 217-224, 376-383, 
4-20, 130-136, 67-73, 389-395 and 338-343 of SEQ ID NO: 8; (e) residues 221-241, 8-66, 151- 
171, 135-146, 333-339, 308-320, 345-351, 269-299, 202-209, 361-368, 86-100, 71-80, 115-129, 

15 374-380 and 323-328 of SEQ ID NO: 10; and (f) residues 280-290, 321-364, 371-388, 21 1- 
231, 393-399, 3 10-3 16, 421-438, 405-41 1,262-268 and 292-301 of SEQ ID NO: 12. Note that 
these polypeptides are listed in decreasing order of preference within in group (a) to (f). 
Preferred antigenic peptide sequences also include residues 218-231, 87-100, 167-180 and 275- 
288 of SEQ ID NO: 2, which have been successfully used to generate antibodies to m3-OST-l . 

20 Thus, in another aspect, the present invention provides for antibodies and methods for 

making antibodies which selectively bind with the 3-OST proteins. These antibodies include 
monoclonal and polyclonal antibodies, as well as functional antibody fragments such as F(ab) and 
Fc. 

The proteins or peptides of the invention may be substantially purified by any of a variety 
25 of methods selected on the basis of the properties revealed by their protein sequences. As shown 
in the examples below, and previously described (26), cells naturally expressing 3-OST-l proteins 
secrete the protein when grown in culture, and the proteins may be isolated from the cell culture 
medium. The 3-OST-2, 3-OST-3A, 3-OST-3B and 3-OST-4 proteins, however, appear to 
include transmembrane domains. Thus, these proteins are hot expected to be secreted at high 
30 levels. Because the 3-OSTs are found in the Golgi apparatus and microsomal bodies of cells 

which naturally express them, a fraction of cells including these organelles may be isolated and the 
proteins may be extracted from this fraction by, for example, detergent solubilization. 
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Alternatively the 3-OST proteins, fusion proteins, or fragments thereof, may be purified fronrcells 
transformed or transfected with expression vectors. For example, insect cells such as Drosophila 
Schneider cells and baculovirus expression systems may be employed with vectors such as 
pPBLUEBAC and pMELBAC (Stratagene, La Jolla, CA); yeast expression systems with vectors 

5 such as pYESHIS Xpress vectors (Invitrogen, San Diego, CA); eukaryotic expression systems 
with vectors such as pcDNA3 (Invitrogen, San Diego, CA), which causes constitutive expression, 
or LacSwitch (Stratagene, La Jolla, CA) which is inducible; or prokaryotic expression systems 
with vectors such as pKK233-3 (Clontech, Palo Alto, CA). In the event that the protein or 
fragment localizes within microsomes derived from the Golgi apparatus, endoplasmic reticulum, 

10 or other membrane containing structures of such cells, the protein may be purified from the 

appropriate cell fraction. Alternatively, if the protein does not localize within these structures, or 
aggregates in inclusion bodies within the recombinant cells (e.g., prokaryotic cells), the protein 
may be purified from whole lysed cells or from solubilized inclusion bodies by standard means. 

Purification can be achieved using standard protein purification procedures including, but 

15 not limited to, affinity chromatography, gel-filtration chromatography, ion-exchange 

chromatography, high-performance liquid chromatography (RP-HPLC, ion-exchange HPLC, size- 
exclusion HPLC), high-performance chromatofocusing chromatography, hydrophobic interaction 
chromatography, immunoprecipitation, or immunoaffinity purification. Gel electrophoresis (e.g., 
PAGE, SDS-PAGE) can also be used to isolate a protein or peptide based on its molecular 

20 weight, charge properties and hydrophobicity. 

A 3-OST protein, or a fragment thereof, may also be conveniently purified by creating a 
fusion protein including the desired 3-OST sequence fused to another peptide such as an antigenic 
determinant (e.g., from Protein A, see below) or poly-His tag (e.g., QIAexpress vectors, 
QIAGEN Corp., Chatsworth, CA), or a larger protein (e.g., GST using the pGEX-27 vector 

25 (Amrad, USA) or green fluorescent protein using the Green Lantern vector (GIBCO/BRL. 
Gaithersburg, MD). The fusion protein may be expressed and recovered from prokaryotic or 
eukaryotic cells and purified by any standard method based upon the fusion vector sequence. For 
example, the fusion protein may be purified by immunoaffinity or immunoprecipitation with an 
antibody to the non-3-OST portion of the fusion or, in the case of a poly-His tag, by affinity 

30 binding to a nickel column. The desired 3-OST protein or fragment can then be further purified 
from the fusion protein by enzymatic cleavage of the fusion protein. Methods for preparing and 
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using such fusion constructs for the purification of proteins are well known in the art and 
numerous kits are now commercially available for this purpose. 

Currently preferred methods for small scale purification of 3-OST-l proteins from the 
media of LTA cells grown in culture may be found in Liu et al. (26), and methods for purification 
5 of 3-OSTs produced recombinant^ in COS-7 cells, CHO cells, murine primary cardiac 
microvascular endothelial cells (CME), murine mast cell line C57. 1, and human primary 
endothelial cells of umbilical vein (HUVEC) may be found in the examples below. These methods 
may also be adapted for use with other cell and expression systems to obtain substantially pure 3- 
OST proteins. 

10 In another aspect, the present invention provides for methods for producing the above- 

described proteins. Thus, in one set of embodiments, the isolated nucleic acids of the invention 
may be used to transform host cells or create transgenic animals. The proteins of the invention 
may then be substantially purified by well known methods including, but not limited to, those 
described in the examples below. Alternatively, the isolated nucleic acids of the invention may be 

15 used in cell-free in vitro translation systems. Such systems are also well known in the art and 
include, but are not limited to, that described in the examples below. 
Antibodies 

The present invention also provides antibodies and methods of making antibodies, 
which will selectively bind to and, thereby, isolate or identify wild type and/or variant forms of the 

20 3-OST proteins. The antibodies of the invention have utility as laboratory reagents for, inter alia. 
immunoaffinity purification of the 3-OSTs, immunoaffinity purification of 3-OST conjugates or 
complexes (e.g., 3-OST-AT, 3-OST-HS), Western blotting to identify cells or tissues expressing 
the 3-OSTs, and immunocytochemistry or immunofluorescence techniques to establish the cellular 
or extracellular location of the protein. 

25 The antibodies of the invention may be generated using the entire 3-OST proteins of 

the invention or using any 3-OST epitope which is characteristic of that protein and which 
substantially distinguishes it from other host proteins. Such epitopes may be identified by 
comparing sequences of amino acid residues from a 3-OST sequence to computer databases of 
protein sequences from the relevant host. Preferably, the epitopes are chosen so as to be highly 

30 immunogenic and specific. 

In a preferred embodiment, the immunogen/epitope is a protein sequence of at least 6- 
12, preferably 10-16, more preferably 16-22 consecutive amino acid residues of the disclosed 
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OST genes. In particular, an analysis of the amino acid sequences of the 3-OST proteins suggests 
that there are regions which will have particular utility in generating antibodies. Thus, in 
preferred embodiments, the inventions provides antigenic 3-OST polypeptides. 

3-OST immunogen preparations may be produced from crude extracts (e.g., 
microsomal fractions of cells expressing the proteins), from proteins or peptides substantially 
purified from cells which naturally or recombinantly express them or, for small immunogens, by 
chemical peptide synthesis. The 3-OST immunogens may also be in the form of a fusion protein 
in which the non-3-OST region is chosen for its adjuvant properties and/or the ability to either 
and/or facilitate purification. As used herein, a 3-OST immunogen shall be defined as a 
preparation including a peptide comprising at least 4-8, and preferably at least 9-15 consecutive 
amino acid residues of the 3-OST proteins or nucleic acids encoding such a peptide coupled with 
transcriptional elements, as disclosed or otherwise enabled herein. Therefore, any 3-OST derived 
polypeptide or protein sequences which are employed to generate antibodies to the 3-OSTs 
should be regarded as 3-OST immunogens. 

The antibodies of the invention may be polyclonal or monoclonal, or may be antibody 
fragments, including Fab fragments, F(ab') 2 , and single chain antibody fragments. In addition, 
after identifying useful antibodies by the method of the invention, recombinant antibodies may be 
generated, including any of the antibody fragments listed above, as well as humanized antibodies 
based upon non-human antibodies to the 3-OST proteins. In light of the present disclosures of 3- 
OST proteins, as well as the characterization of other 3-OSTs enabled herein, one of ordinary skill 
in the art may produce the above-described antibodies by any of a variety of standard means well 
known in the art. For an overview of antibody techniques, see Antibody Engineering. 2nd Ed., 
Borrebaek, ed., Oxford University Press, Oxford (1995). 

As a general matter, monoclonal anti-3-OST antibodies may be produced by first 
injecting a mouse, rabbit, goat or other suitable animal with a 3-OST immunogen in a suitable 
carrier or diluent. As above, carrier proteins or adjuvants may be utilized and booster injections 
(e.g., bi- or tri-weekly over 8-10 weeks) are recommended. After allowing for development of a 
humoral response, the animals are sacrificed and their spleens are removed and resuspended in, 
for example, phosphate buffered saline (PBS). The spleen cells serve as a source of lymphocytes, 
some of which are producing antibody of the appropriate specificity. These cells are then fused 
with an immortalized cell line (e.g., myeloma), and the products of the fusion are plated into a 
number of tissue culture wells in the presence of a selective agent such as HAT. The wells are 
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serially screened and replated, each time selecting cells making useful antibody. Typically, several 
screening and replating procedures are carried out until over 90% of the wells contain single 
clones which are positive for antibody production. Monoclonal antibodies produced by such 
clones may be purified by standard methods such as affinity chromatography using Protein A 
5 Sepharose, by ion-exchange chromatography, or by variations and combinations of these 
techniques. 

The antibodies of the invention may be labeled or conjugated with other compounds or 
materials for diagnostic and/or therapeutic uses. For example, they may be coupled to 
radionuclides, fluorescent compounds, or enzymes for imaging or therapy, or to liposomes for the 

10 targeting of compounds contained in the liposomes to a specific tissue location. 
Assays for Drugs Which Affect 3-OST Expression 

In another series of embodiments, the present invention provides assays for identifying 
small molecules or other compounds which are capable of inducing or inhibiting the expression of 
the 3-OST genes and proteins. The assays may be performed in vitro using non-transformed 

15 cells, established cell lines, or the transformed cells of the invention, or in vivo using normal non- 
human animals or the transgenic animal models of the invention. 

In particular, the assays may detect the presence of increased or decreased expression 
of nucleic acids under the transcriptional control of 3-OST promoter and regulatory sequences on 
the basis of increased or decreased mRNA expression (using, e.g., the nucleic acid probes 

20 disclosed and enabled herein), increased or decreased levels of protein products encoded for such 
nucleic acids (using, e.g., the anti-3-OST antibodies disclosed and enabled herein), or increased or 
decreased levels of activity of such a protein (e.g., P-galactosidase or luciferase). 

Thus, for example, one may culture cells known to express a particular 3-OST, or 
recombination modified to express at least a functional fragment or epitope of 3-OST protein 

25 under the transcriptional control of 3-OST promoter and add to the culture medium one or more 
test compounds. After allowing a sufficient period of time (e.g., 0-72 hours) for the compound to 
induce or inhibit the expression of the 3-OST, any change in levels of expression from an 
established baseline may be detected using any of the techniques well known in the art. Using the 
nucleic acid probes and /or antibodies disclosed and enabled herein, detection of changes in the 

30 expression of a 3-OST, and thus identification of the compound as an inducer or inhibitor of 3- 
OST expression, requires only routine experimentation. For example, one may assay for 3-OST 
activity by measuring the conversion of HS W into HS** by methods known in the art (70). 
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In other embodiments, a recombinant assay is employed in which a reporter gene-is 
operably joined to 3-OST promoter and regulatory sequences so as to be under the transcriptional 
control of these sequences. The reporter gene may be any gene which encodes a transcriptional 
or transitional product which is readily assayed or which has a readily determinable affect or 

5 phenotype. Preferred reporter genes are those encoding enzymes with readily detectable activity, 
including without limitation 0-galactosidase, green fluorescent protein , alkaline phosphatase, or 
luciferase is operably joined to the 5' regulatory regions of a 3-OST gene. The 3-OST regulatory 
regions, may be readily isolated and cloned by one of ordinary skill in the art in light of the present 
disclosure of the coding regions of these genes. The reporter gene and regulatory regions are 

10 joined in-frame (or in each of the three possible reading frames) so that transcription and 

translation of the reporter gene may proceed under the control of the 3-OST regulatory elements. 
The recombinant construct may then be introduced into any appropriate host cell as described 
herein. The transformed cells may be grown in culture and, after establishing the baseline level of 
expression of the reporter gene, test compounds may be added to the medium. The ease of 

15 detection of the expression of the reporter gene provides for a rapid, high through-put assay for 
the identification of inducers and inhibitors of the 3-OST gene. 

Compounds identified by this method will have potential utility in modifying the 
expression of the 3-OST genes in vivo. These compounds may be further tested in the animal 
models disclosed and enabled herein to identify those compounds having the most potent in vivo 

20 effects. 

Methods for Heparan Modification 

In another aspect, the present invention provides methods for 3-O-sulfating saccharide 
residues within a preparation of glycosaminoglycan or proteoglycan polysaccharides in which the 
polysaccharides include a polysaccharide sequence of GlcA-*GlcNS ±6S. These methods 

25 comprise contacting the CHcA-»GlcNS ±6S-containing polysaccharide preparation with 3-OST 
protein in the presence of a sulfate donor under conditions which permit the 3-OST to convert the 
GlcA->GlcNS ±6S sequence to GlcA-^GlcNS 3S ±6S. In particular embodiments, the 
QcA-^GlcNS ±6S sequence comprises a part of an HS** precursor sequence (i.e., QcA~»GlcNS 
±6S->IdoA 2S-> GlcNS ±6S or IdoA-^GlcNAc 6S->GlcA->GlcNS ±6S-*IdoA 2S-» GlcNS 

30 6S) or a part of an HS 1 " 01 precursor sequence (i.e., IdoA->GlcNS 6S^GlcA-»GlcNS 

±6S->IdoA 2S-> GlcNS 6S; IdoA-^acNAc->GlcA->GlcNS ±6S->IdoA 2S-» GlcNS 6S; 
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IdoA-^GlcNS-»GlcA-*GlcNS ±6S-»IdoA 2S-> GlcNS 6S; IdoA->GlcNAc 
6S->GlcA->GIcNS ±6S-»IdoA 2S-> GlcNS or IdoA-»GlcNS 6S-»GlcA-*GlcNS ±6S->IdoA 
2S-> GlcNS). Conversion of the HS act precursor pool to HS* ct increases the fraction with AT- 
binding activity and is particularly useful in the production of anticoagulant heparan sulfate 

5 products. Thus, in another embodiment, the present invention provides for means of enriching the 
AT-binding fraction of a heparan sulfate pool by contacting the polysaccharide preparation with 
3-OST protein in the presence of a sulfate donor under conditions which permit the 3-OST HS** 
conversion activity. In preferred embodiments, the sulfate donor is 3-phospho-adenosine 5'- 
phosphosulfate (PAPS). 

10 Methods of Partially Sequencing Complex Polysaccharides 

In another aspect, the present invention provides methods for partially sequencing 
complex polysaccharides such as heparan sulfates (HS) or other glycosaminoglycans (GAGs). In 
these methods, a pool of polysaccharides which includes sequences which may be 3-O-sulfated is 
contacted with a 3-OST protein in the presence of a sulfate donor (e.g., PAPS) under conditions 

15 which permit sulfation by 3-OST. The treated polysaccharides are then subjected to degradation 
by enzymes which degrade polysaccharides in a sequence-specific manner (e.g., polysaccharide 
lyases; heparinase I, II or HI) and the size profile of the resulting fragments is determined. An 
identical pool which has not been treated with 3-OST is similarly cleaved by the same enzymes 
and a size profile determined. Changes in the size profiles indicate that 3-OST activity has 

20 modified the saccharide units so as to prevent (or permit) cleavage at sites which previously were 
(or were not) cleaved. Thus, comparison of the profiles will indicate positions at which the target 
sequences for 3-OST activity are present and provide a partial polysaccharide sequence. 

In another embodiment, the sequence of complex polysaccharides such as HS or GAG 
may be partially determined using sequence specific polysaccharide affinity fractionation. To this 

25 end, 3-OST proteins which lack enzymatic function can be identified or produced (e.g., altering or 
deleting a portion of the catalytic ST domain by site-directed or deletion mutagenesis). These 
inactive forms will bind GAGs in a sequence dependent manner. For example, the 3-OST- 1 
protein normally, minimally, binds a GAG sequence containing GlcA-GlcNS ±6S. When the 
active site of this protein is neutralized, the kj of the protein for these sequences will be relatively 

30 unaffected. This reagent will allow sequence-specific saccharide affinity fractionation from 

complex mixtures of GAGs. The purified structures can be degraded in a step-wise fashion with 
exolytic, endolytic enzymes and/or nitrous acid, and the resulting degradation products can be 
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compared to standard compounds of known structure. This method will allow the quantitation 
and characterization of known structures contained within unknown complex polysaccharide 
samples. 

In another embodiment, partial sequence can be obtained using the 3-OSTs of the 

5 invention or other heparan sulfate sequence specific binding ligands as protective groups prior to 
treating the HS or GAG with modifying agents that detectably alter the HS or GAG. Useful 
protective groups include catalytically inactive enzymes, chimeric enzymes and small molecule 
ligands with identified sequence binding specificities. The protecting group is contacted with the 
heparan or other glycosaminoglycans (GAGs), and the resultant complex is treated with one or 

10 more modifying agents. Useful modifying agents include catalytically active heparan lyases, 
sulfotransferases, N-deacetylases, epimerases, or chimeric proteins of the invention. In 
embodiments where multiple protecting groups and/or modifying reagents in are used in 
combination, the sample is first contacted with the protective group, then each modifying reagent 
may be with contacted with the protected polysaccharide, either simultaneously or in turn. The 

15 protective group will interfere with the ability of a chemically modifying agent to interact with, 
attach to and/or cleave specific GAG sequence motifs. The sample can then be analyzed for 
ligand-specific protection and/or cleavage to elucidate the sequence of the original GAG using 
separation and/or quantitation using methods known in the art. 

In some embodiments, as a preliminary step, full length heparans and GAG oligomers can 

20 be fractionated over an immobilized affinity ligand immobilized at their reducing ends via 

hydrazide chemistry. The fraction of GAG captured by the immobile phase permits a quantitation 
of the mass or total percent of the target sequence (out of total GAG.) Thus, unique heparan or 
other GAG structures may be concentrated and/or specifically eluted for further analysis. 

One useful method for the detection binding is the Biomolecular Interaction Assay or 

25 "BIAcore" system developed by Pharmacia Biosensor and described in the manufacturer's 

protocol (LKB Pharmacia, Sweden). In light of the present disclosure, one of ordinary skill in the 
art is now enabled to employ this system, or a substantial equivalent, to identify proteins or other 
compounds having sequence-specific HS or GAG binding capacity, or HS or GAGs sequences 
having 3-OST binding capacity. Such systems utilize surface plasmon resonance, an optical 

30 phenomenon that detects changes in refractive indices. A sample of interest is passed over an 
immobilized ligand (e.g., a 3-OST fusion protein or specific GAG) and binding interactions are 
registered as changes in the refractive index. 
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Cell Lines and Cell Culture 

The clonal L cell line LTA (35, 41), the generation of clone 33, an LTA transfectant that 
5 over-expresses the ryudocan^cAs cDNA (33), a rapidly growing revertant of clone 33, L-33 + (26), 
and RFPEC, an immortalized line derived from rat fat-pad endothelial cells (8) have previously 
been described. Primary mouse neonatal endothelial cells from the cardiac microvasculature of 
day 3-5 neonates (CME cells) (from Dr. Jay Edelberg, MIT/Beth Israel Hospital) and COS-7 cells 
(ATCC) were employed. Primary human umbilical cells (HUVEC) were maintain according to 

10 the supplier's (Clonetics Inc.) protocol. Unless otherwise stated, all cell lines were maintained in 
logarithmic growth by subculturing biweekly in Dulbecco's modified Eagle medium (Life 
Technologies, Inc.) containing 10% fetal bovine serum, 100 ng/ml streptomycin, and 100 units/ml 
penicillin at 37 °C under 5% CCb humidified atmosphere, as previously described (42). 
Exponentially growing cultures were generated by inoculating 54,000 cells/cm 2 and incubating for 

15 two days, whereas post-confluent cultures were produced by inoculating 250,000 cells/cm 2 and 
allowing growth for 10 days with medium exchanges on days 4, 7, 8, and 9. 
Peptide Purification and Sequencing 

The purification of mouse 3-OST-l from L-33 + has been previously described (26) and the 
final step 4 product was concentrated by reverse phase chromatography on a HP 1090 M system 

20 (Hewlett Packard) equipped with a C4 reverse phase HPLC column (250 x 2. 1 mm, 300 A pore 
size, 5 |am particle size) (Vydac, number 214TP52) equilibrated in 1.6% acetonitrile (v/v), 0.1% 
TFA (v/v). After application of sample, the reverse phase matrix was washed with 60% 
acetonitrile, 0. 1% TFA, and bound species were eluted with 78.4% acetonitrile, 0. 1% TFA 
Samples of 1.5 or 3 ng, from two independent purifications, were digested with 0.15 or 0.3 ng, 

25 respectively, of endopeptidase Lys-C (Waco) in a reaction volume of 100 nl containing 1% 

RTX100 (Calbiochem), 10% acetonitrile and 100 mM Tris-HCl pH 8.0, at 37 °C for -16 h (43). 
Digestion products were chromatographed on an HP 1090 M system (Hewlett Packard) equipped 
with the above described C4 reverse phase HPLC column equilibrated in 98% Buffer A (0. 1% 
TFA (v/v))/2% Buffer B (80% acetonitrile (v/v)/0.85% TFA (v/v)). After application of digestion 

30 products, the reverse phase matrix was washed with 98% Buffer A/2% Buffer B, and bound 
species were eluted with linear gradients of Buffer B increasing to 37.5% over 60 min, to 75% 
over 30 min, and to 98% over 15 min (44). The eluate was monitored for absorbance at 210 and 
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280 nm, peptide peaks were individually collected and analyzed with a model 477A/120A Protein 
Sequenator (Applied Biosystems). In addition, the NH 2 -terminal sequence of 1 pg of 
concentrated 3-OST-l sample was directly determined. 
Isolation of Mouse 3-OST-l Clones 
5 Isolation of Cytoplasmic an d PolvfAY* RNA. Cytoplasmic RNA (17.5 mg) was isolated 

from post-confluent cultures of LTA cells (12 flasks of 175 cm 2 , -1.6 x 10 9 cells) by a 
modification of the procedure of Favaloro (45). Monolayers were twice washed with PBS, cells 
were recovered by trypsinization and centrifugation (1000 x g for 2 min), and cell pellets were 
washed by resuspension in PBS followed by centrifugation (1300 x g for 4 min). Cells were lysed 

10 by vortexing for 30 sec in 12 ml of ice cold 50 mM Tris, pH 7.4, 140 mM NaCl, 5 mM EDTA, 
1 % Triton X-100, 5 mM vanadium ribonucleoside complexes (Life Sciences Technologies), 
samples were incubated on ice for 10 min and then vortexed for 1 min. Nuclei were pelleted by 
centrifugation at 6000 xg for 10 min, the supernatant was mixed with an equal volume of 200 
mM Tris, pH 7.4, 300 mM NaCl, 2% SDS, 25 mM EDTA, containing 200 pg/ml of proteinase K 

15 (Boehringer Mannheim), and the mixture was incubated at 65 °C for 2 hr. Samples were 
extracted twice against an equal volume of phenoI/cMoroform/isoamyl alcohol (25:24: 1), the 
aqueous phase was combined with 0.7 volumes of isopropanol, cytoplasmic RNA was pelleted by 
centrifugation at 3500 x g for 10 min, and was resuspended in 3.6 ml of 10 mM Tris, pH 7.4, 1 
mM EDTA Poly(A) + RNA (59 pg) was isolated from 1 6 mg of cytoplasmic RNA by two 

20 sequential purifications against 100 mg of oligo(dT) cellulose (Life Sciences Technologies, 

#15939-010) according to the manufacturer's specifications except that binding and wash buffers 
contained 0. 1 % SDS and LiCl was substituted for NaCl. The final eluate (1 .5 ml) was extracted 
against 1.5 ml of phenol/chloroform/isoamyl alcohol (25:24: 1), the aqueous phase was then 
adjusted to 100 mM LiCl and 260 mM NaCl, an equal volume of isopropanol was added, the 

25 mixture was centrifuged at 15,000 x g for 30 min and the poly(A) + RNA pellet was recovered in 
40 pi of diethyl pyrocaibonate treated water. 

PCR Cloning and Generation of a Mouse 3-OST-l Probe . Degenerate PCR primers IS, 
2S, 2 A, and 3 A (described in Shworak et al. (1997) J. Biol. Chem. 272, in press) were obtained 
from Bio Synthesis. First strand cDNA was generated in a 50 pi volume from 5 pg of LTA 

30 poly(A) + RNA primed with oligo(dT) using an RT-PCR kit (Stratagene, La Jolla, CA) according 
to the manufacturer's specifications. Touchdown PCR (46, 47) reactions (50 pi) contained 1 pi 
of first strand cDNA, 25 pmol of each primer, 0.25 pi of AmpliTaq Gold (Perkin Elmer), 200 pM 
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of each dNTP and 1 x GeneAmp PCR buffer. Two distinct sets of touchdown PCR conditions 
were required to obtain optimal yields of product. For amplification with primers IS and 2A, 
reactions were heated to 95 °C for 9 min, subjected to 20 cycles of 94 °C for 30 sec, and 68 °C 
for 1 min with a 0.5 °C reduction per cycle, followed by 20 cycles of 94 °C for 30 sec, 58 °C for 

5 30 sec with a 0.5 °C reduction per cycle, and 75 °C for 30 sec, then 15 cycles of 94 °C for 30 sec, 
55 °C for 10 sec, and ramping to 75 °C over 50 sec. Alternatively, for amplification with primers 
IS and 3 A or primers 2S and 3 A, reactions were heated to 95 °C for 4 min, subjected to 47 cycles 
of 95 °C for 30 sec, and 69.5 °C for 2 min with 0.2 °C and 1 sec reductions per cycle, followed 
by 25 cycles of 95 °C for 30 sec, 60 °C for 15 sec, and ramping to 75 °C over 1 min. 

10 Amplification products were purified as the retentate from centrifugal ultrafiltration against a 
30,000 molecular weight cutoff membrane (Millipore, # SK1P343JO), then 200 ng of DNA was 
end polished with Pfu DNA polymerase and subcloned into pCR-Script Amp SK(+) (Stratagene, 
La Jolla, CA, #21 1 188) according to the manufacturer's specifications. A resulting plasmid, 
pNWS182, contained the 1S/3A amplification product of 779 bp which was released by digestion 

15 with EcoRl and SacU, and isolated by low melting point agarose gel electrophoresis. A 32 P- 

labeled primer extension probe was then generated with a random primer labeling kit (Stratagene, 
La Jolla, CA, # 300385) by replacing the random primers with 5 nM of primer 3 A. 

Construction and Scre ening of a n L Cell cDNA Library . Using the manufacturer's 
recommended conditions, an oligo(dT>primed X Zap Express cDNA library (Stratagene, La Jolla, 

20 CA, # 20045 1) was generated from 5 \ig of LTA poly(A) + RNA which had been pretreated with 
methylmercury hydroxide. About 1 .5 x 10 6 primary recombinants were plaque amplified by 
infection into K coli XLl-Blue MRF. From the amplified library, 1 .3 x 10 6 plaques were 
transferred to Colony/Plaque Screen (Du Pont-New England Nuclear) and screened with the 
above described 32 P-labeled probe specific for 3-OST-l. Hybridizations were performed at 42 °C 

25 in 1.7 x SSC, 8.3% dextran sulfate, 42% formamide, 0.8% SDS and filters were washed twice 

with 2 x SSC, 1% SDS for 30 min at 65 °C. Positive clones were plaque purified and then in vivo 
excised into pBK-CMV based phagemids by infection with ExAssist helper phage followed by 
transduction of filamentous phage particles into E. coli XLOLR. 
Isolation of Human 3-OST-l cDNA Clones 

30 The National Center for Biotechnology Information data bank of I.M.A.G.E. Consortium 

(LLNL) expressed sequence tag cDNA clones (48) was probed with the deduced mouse 3-OST-l 
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araino acid sequence to reveal three partial length species. LM AG.E. Consortium CloneDi 
220372 (accession numbers H86812 and H86876) was from the retinal library of Soares 
(N2b4HR), whereas clones 301725 (accession numbers N90867 and W16558) and 301726 
(accession numbers N90856 and W16555) were from the fetal lung library of Soares (NbHL19W 

5 ) and were obtained from the TIGR/ATCC Special Collection (ATCC). The EcoRUNofl insert of 
clone 220372 was 32 P labeled by random priming and used to screen 5 x 10 s plaques from a A, 
TriplEx Brain cDNA library (Clontech, Palo Alto, CA), as described above. Positive plaques 
were purified, TriplEx based plasmids were in vivo excised according to the manufacturer's 
protocol, and were sequenced as described below. 

10 Characterization of Mouse and Human 3-OST-l cDNA Clones 

The 5 1 and 3' regions of all partial and full length clones were enzymatically sequenced 
from flanking primer sites of the respective cloning vectors. For full length clones the remaining 
sequence of both strands was obtained with internally priming oligonucleotides. Automated 
fluorescence sequencing was performed with Perkin Elmer Applied Biosystems Models 373A and 

15 477 DNA sequencers. Each reaction typically yielded 400 to 600 bases of high quality sequence. 
cDNA sequence files were aligned and compiled with the program Sequencher 3.0 (Gene Codes 
Corp.). All additional manipulations were performed with the University of Wisconsin Genetics 
Computer Group sequence analysis software package. Sequence comparison searches were 
performed on the databases of GenBank, EMBL, DDBJ, PDB, SwissProt, PIR, and dbEST. 

20 Expression of 3-OST-l cDNAs 

Construction of Expression Plasmids . The plasmid pCMV-3-OST contains the mouse 3- 
OST-1 cDNA, an EcdKUXhol fragment from pNWS228, inserted between the CMV promoter 
and the bovine growth hormone polyadenylation signal of EcdKUXhol digested and phosphatase 
treated pcDNA3 (Invitrogen). The plasmid pCMV-ProA3-OST is of similar structure, except the 

25 first 26 amino acid of 3-OST-l are replaced with 291 amino acids encoding a fusion protein of the 
transin leader sequence followed by Protein A and a factor Xa cleavage site. pCMV-ProA-3- 
OST was generated by ligating a BamUVSmal fragment containing the Protein A region from 
pRK5F10PROTA (49), and an Xmal (end-filled with T 4 polymerase)/A7*oI fragment containing 
most of the mouse 3-OST-l cDNA from pNWS228, into BamHUXhol digested and phosphatase 

30 treated pcDNA3 (Invitrogen). The in vitro transcription plasmid, pNWS237, contains a T3 

promoter site 5 1 of the human 3-OST-l cDNA and was constructed by inserting complementary 
oligonucleotides (Bio Synthesis) into the EcdRI site of the TriplEx based plasmid, pJL30. 
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Transient Expression of the Mouse 3-OST-l cDNA in COS-7 Cells . For each expression 
construct, three 175 cm 2 flasks were seeded with 3.6 x 10 6 COS-7 cells, 6 h later the medium was 
exchanged with DMEM containing 10% Nu-Serum (Life Technologies, Inc.) with 100 jig/ml 
streptomycin and 100 units/ml penicillin, and cells were grown for an additional day. Monolayers 

5 were washed with PBS then incubated at 37 °C for 2.5 h with 10 ml/flask of freshly prepared 
DMEM containing 235 ng/ml DEAE-dextran (M.W. 500,000, Pharmacia), 9.5 mM Tris-HCl, pH 
7.4, 0.9 mM chloroquine-diphosphate (Sigma), and 3 ng/ml of the appropriate pcDNA3 based 
expression plasmid. Monolayers were then exposed to freshly prepared 10% DMSO in PBS for 
1.5 min, washed twice with nonsupplemented DMEM, fed 30 ml/flask of DMEM containing 10% 

10 fetal bovine serum, 100 ng/ml streptomycin, and 100 units/ml penicillin, and cells were grown for 
an additional day. Monolayers were washed with PBS, then cells were grown in 40 ml/flask 
Serum-Free Medium (DMEM containing 25 mM HEPES, pH 8.0, 1% Nutridoma SP (Boehringer 
Mannheim) (v/v), an additional 2 mM glutamine, 10 ng/ml biotin (Pierce), 100 ng/ml 
streptomycin, 100 units/ml penicillin, and 1 x of a previously described Trace Metal Mix (26)) for 

15 24 h. COS-cell conditioned Serum-Free Medium was harvested, debris was removed by 

centrifugation at 1,000 x g for 10 min followed by filtration through a 0.45 \im membrane, then 
samples were either immediately processed or were snap frozen with liquid nitrogen and stored at 
-80 °C. Occasionally, conditioned medium from a second incubation of 8-24 h was also collected. 
Purification of Wild-type and Protein A Tapped Mouse Recombinan t 3-OST-l . Wild-type 

20 mouse recombinantly expressed 3-OST-l enzyme (r3-OST-l) was purified, at 4 °C, from 240 ml 
of freshly generated Serum-Free Medium conditioned by COS-7 cells transfected with pCMV-3- 
OST. The medium was adjusted to pH 8.0, mixed with an equal volume 2% glycerol, then loaded 
(25 ml/h) onto a heparin-AF Toyopearl-650M column (0.8 x 5.7 cm) (TosoHaas, 
Montgomeryville, PA) equilibrated in 50 mM NaCl, 10 mM Tris-HCl, pH 8.0, 1% glycerol (v/v) 

25 (Buffer C). The column was washed with 20 ml of Buffer C at a flow rate of 0.8 ml/min, then 
with 20 ml of 1 50 mM NaCl, 10 mM Tris-HCl, pH 8.0, 1% glycerol (v/v) at a flow rate of 0.5 
ml/min, and protein was eluted at a flow rate of 0.25 ml/min with a 20 ml linear NaCl gradient 
extending from 150 mM to 750 mM NaCl in Buffer C. The fractions exhibiting HS"* conversion 
activity (approximately 4 ml) were pooled, brought to a final concentration of 0.6% CHAPS 

30 (w/v) (Sigma) and dialyzed for 1 6 h against 4 1 of 25 mM MOPS (3-[tf-morpholino] 

propanesulfonic acid) (Sigma), pH 7.0, 1% glycerol (v/v), 0.6% CHAPS (w/v) (MCG buffer ) 
containing 50 mM NaCl. The dialysate was applied to a S^S'-ADP-agarose column (0.8 x 1 .2 cm, 
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3.7 mmol of 3',5'-ADP/ml of gel) (Sigma ) and eluted as previously described (26). The fractions 
containing HS* ct conversion activity were pooled (approximately 4 ml), aliquoted, frozen in liquid 
nitrogen and stored at -80 °C. 

Protein A tagged mouse r3-OST-l was purified, at 4 °C, from 155 ml of previously frozen 
5 Serum-Free Medium conditioned by COS-7 cells transfected with pCMV-ProA3-OST. IgG 
agarose beads (3 10 \xl of a 50/50 slurry; Sigma) were gently stirred with the conditioned medium 
for 3h, recovered by centrifugation at 2,000 x g for 10 min, and washed twice with 1 ml of MCG 
containing 250 mM NaCl to remove nonspecifically bound protein. Protein A fusion-protein was 
eluted from the beads with two sequential 30 min incubations in 100 nl of 50 mM sodium acetate, 

10 pH 4.5, 150 mM NaCl, 0.6% CHAPS and 1% glycerol. The pooled eluates were combined with 
an equal volume of 500 mM MOPS, pH 7.0, 0.6% CHAPS, and 1% glycerol, then aliquoted, 
frozen in liquid nitrogen and stored at -80 °C. 
Retroviral Transduction of CHO and MNE Cells with 3-QST-1 

Plasmid retrovirus vector construction. A retroviral transduction system was used to 

15 transduce CHO cells and mouse neonatal endothelial (MNE) cells. This system may serve as a 
model for in vivo transduction for use in gene therapy. 

The retrovirus backbone plasmid pMSCV-PGK-EGFP is a derivative of pMSCVpac a 
(Dr. Robert Hawley University of Toronto.) The puromycin acetyl transferase gene cassette in 
pMSCVpac was removed and replaced with an Enhanced GFP (Dr. David Baltimore MIT). The 

20 pMSCV-PGK-GFP vector was assembled by digestion of the plasmid with Hindm and Clal, 
followed by treatment with Klenow fragment. The EGFP cistron 720 bp fragment was derived 
from the digestion of pMSCV-EGFPpac with EcoRI, and blunting with the Klenow fragment. 
The EGFP blunt-ended fragment was then ligated into the blunt-ended pMSCV vector. The 
resulting plasmids were tested for proper orientation by restriction analysis. The reporter virus, 

25 pMSCVPLAP, is designed to express the wild type human placental alkaline phosphatase (PLAP) 
transcribed from the 5* LTR. pMSCV-SEAP-PGK-EGFP was made by cloning the secreted 
alkaline phosphatase (SEAP) BglQ and Hpal 1.723 kb fragment from pSEAP2-basic plasmid 
(Clontech, Palo Alto, CA) into the BglQ and Hpal cut pMSCV-PGKEGFP vector. pCMV3-OST 
was digested with Bgin and Xhol to release the wild type mouse 3-OST-l cDNA. The 1.623 kb 

30 3-OST-l cDNA fragment was cloned into the Bglll and Xhol sites in pMSCV-PGK-EGFP. The 
occurrence of the insert of interest present in the correct orientation was ascertained by restriction 
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analysis. All plasmid DNA prepared for transfection was made with the Invitrogen SNAP-MIDI 
kits according to the manufacturer's directions. 

Cells and cell culture. Dulbecco's modified Eagle medium (DMEM), F-12 Ham's medium 
and penicillin/streptomycin, 0.25% trypsin, 10 mM EDTA, were obtained from Life Technologies, 

5 Inc., GE3CO-BRL (Gaithersburg, MD). The PHOENIX ecotropic retroviral packaging cell line 
(ATCC #SD 3444) was grown in DMEM, 10% heat-treated fetal bovine serum (FBS) (JRH 
Biosciences, Lenexa, KS), 100 units/ml penicillin, 100 jxg/ml. PHOENIX cells were subcultured 
three times weekly at a split ratio of approximately 1 :8 in a 37 °C humidified, 5.0% CO2 
incubator. CHOK1 ATCC CCL 61 cells (CHO) were grown in F-12 medium supplemented with 

10 10% fetal bovine serum, and 100 units/ml penicillin, 100 pig/ml in a 37 °C humidified, 5.0% C0 2 
incubator. CHO cells were subcultured three times weekly at a split ratio of approximately 1 :4 in 
a 37 °C humidified, 5.0% CO2 incubator. 1 x 10 6 CHO cells were transfected with 10 ng of 
pcB7-ECOTROPIC (generous gift of Dr. Harvey Lodish) by the standard calcium phosphate 
precipitation technique. Plasmid pcB7-ECOTROPIC expresses the MCAT1 gene (ecotropic 

15 retrovirus receptor cDNA) and hygromycin resistance gene transcribed from separate constitutive 
promoters. The transfected cells were selected for hygromycin resistance in 200 ng/ml 
„ hygromycin (Life Technologies). The stable, hygromycin-resistant clones were assayed for their 
ability to- take up and express reporter virus (MSCVPLAP). Fixation and staining for cell-bound 
alkaline phosphatase was performed by standard techniques. CHO clone 4B was chosen because 

20 it transduced most efficiently at the highest dilution tested (i.e., 1 : 10,000), and was expanded for 
further analysis. Transduction of CH04B with ecotropic retroviruses is equal to that achievable 
with NIH3T3 cells. Low passage number (passage 2-5), primary mouse neonatal cardiac 
endothelial cells (MNE) were prepared by standard techniques. MNE cells were cultured in a 1 : 1 
vol./vol. admixture of EGM:EGM-2 (CLONETICS) in a 37 °C, humidified, 5.0% CCb incubator. 

25 MNE cells were subcultured once weekly at a split ratio of approximately 1 :3 in a 37 °C, 
humidified, 5.0% CO2 incubator. 

Northern blot analysis. Total RNA was prepared from confluent T-80 flasks of each of 
the transduced and untransduced cells using the QIAGEN RNAeasy kit with QIASHREDDER. 
10 ^ig of total cellular RNA was denatured and resolved by electrophoresis in a 1,5% agarose gel, 

30 and then blotted onto GENE-Screen+ (DuPont NEN) with 2X SSPE. The membrane was then 
UV cross-linked using a STRATAlinker. 32 P-radiolabeled cDNA probes were prepared from the 
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fragments of DNA used for cloning the mouse 3-OST-l and SEAP as described above. 
Radiolabeled probes were prepared using 25ng of each template and the Amersham Megaprime 
kit, and a 32 P dCTP from DuPont NEN according to the manufacturer's directions. 
Hybridizations were performed in sealable plastic bags at 68 °C with 1 x 10 6 cpm of probe/ml in 

5 10 ml of QUICKHYB (Stratagene, La Jolla, CA), following the manufacturer's instructions. 
Post-hybridization washes were: once for 15 minutes in IX SSPE, 1.0% SDS at 45 °C; and then 
twice for 15 minutes each in 0.2X SSPE, 0.5% SDS 650C. After washing, the blots were briefly 
air dried, placed in sealable plastic bags then exposed to Kodak XAR-MS film with intensifying 
screens at -80 °C for from overnight to five days. Quantitation of hybridizing signal intensity was 

10 performed using a Betascope 603 blot analyzer. Transcripts derived from the 5' LTR of these 
engineered proviruses are large (ca. 7 kb). Since they are large, have multiple sites of 
transcriptional initiation provirus (5' LTR and pgk promoters), and the 3-OST-l construct has 
more than one poly(A) addition signal, bona-fide hybridizable mRNA will appear as different sizes 
in northern blot analysis. The total amount of hybridizing material detected, per sample lane, with 

15 any one probe was used to calculate and compare mRNA expression levels. 

Virion production. Virions were produced by programming ecotropic PHOENIX 
packaging cells with recombinant provirus plasmids using the calcium phosphate transfection 
technique. 10 ng/well of each recombinant retroviral construct plasmid was transfected via 
calcium precipitation with an overnight incubation period. Following the precipitation step, the 

20 cells were re-fed with 2 ml/well of fresh DMEM and incubated overnight. Each 2 ml of viral 

supernatant was collected and flash-frozen in liquid nitrogen and stored at -80 °C, or used directly 
after a low-speed centrifugation. 

TramKfyction protocol. Target cells were trypsinized, counted with a Coulter cell counter 
and then plated at 150,000 cells (NIH 3T3/CH04B) or 50,000 cells (MNE) per well of a cluster-6 

25 well plate. 24 hours later, target cells (<70% confluent) were incubated overnight with viral 
supernatants containing as adjuvants either 5 ng/ml polybrene for NIH3T3/CH04B or 25 ng/ml 
DEAE-dextran (Pharmacia) for MNE. After 12 hours of virus exposure, the growth media was 
replaced. CHO cells destined for FACS sorting were exposed to recombinant retrovirus two 
times at a multiplicity of infection (MOI) of 0.3. MNE cells were transduced one time for 12 

30 hours at an MOI of 0.74 for recombinant 3-OST- 1 virus and 0.72 for recombinant SEAP virus. 
Transduced cells were allowed to incubate in fresh growth medium for 48 hours prior to FACS to 
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allow for maximum proviral expression. Recombinant virus titers ranged from 1 x 10 5 -2 x 10 6 
infectious particles per ml as measured with either NIH3T3 or CH04B cells using FACS analysis 
scoring for EGFP positive cells. Virus titers were reduced approximately eight to ten-fold on 
primary MNE cells relative to NIH3T3. 

5 Cell-Free Synthesis of Mouse and Human r3-OST-l . 

Synthetic capped mouse and human 3-OST-l mRNAs were generated from Notl 
linearized pNWS228 and HinDUl linearized pNWS237, respectively, using T 3 polymerase and 
m 7 G(5 , )ppp(5 , )G, as previously described (50). Unlabeled in vitro translation reactions (25 
contained 0.25 ng of synthetic mRNA, 1.8 \xl canine pancreatic microsomal membranes 

10 (Promega), 0.5 nl each of Amino Acid Mixture Minus Leucine and Amino Acid Mixture Minus 
Methionine, and were performed with nuclease-treated reticulocyte lysate (Promega), according 
to the manufacturer's specifications. 

Measurement of HS"* Conversion Activity . The HS act conversion activity, a 3-OST-l 
catalyzed reaction which requires unlabeled PAPS to convert 35 S-HS iMCl into 35 S-HS act , of crude 

15 and purified r3-OST-l samples was determined by comparison against a standard curve generated 
with 1 to 32 units of previously purified native 3-OST-l, as previously described (26). The 35 S- 
HS" 1 ** substrate was purified from metabolically labeled cell surface HS of exponentially growing 
clone 33 cells, as previously described (35). 
Identification of Enzymatic Reaction Products 

20 35 S-labeling of HS bv r3-OST-l . 35 S-labeled HS was generated by incubating the various 

forms of r3-OST-l with [ 35 S]PAPS and unlabeled HS™*, which were prepared as previously 
described (26, 35). Wild-type and Protein A tagged r3-OST-l (2500 units of HS** conversion 
activity) purified from COS cell conditioned medium, were incubated in a 500 jjl reaction mixture, 
as previously described (26), for 2 h at 37 °C and 35 S-labeled polysaccharides were purified by 

25 DEAE-Sepharose chromatography as previously described (26). For cell-free synthesized r3- 
OST-1, 35 S-labeling of HS was performed in a reticulocyte lysate based reaction mixture (35) 
except that 100 jd reactions contained 100 to 300 units of in vitro translated r3-OST, 180 nM 
unlabeled HS***, 5 nM PAPS (60 xlO 6 cpm) and samples were incubated at 37 °C for 2 h. The 
reaction was quenched by the addition of 300 \il of 267 mM NaCl, 13.3 ng/ml glycogen and 

30 extraction against 600 |xl of phenol/chloroform/isoamyl alcohol (25:24: 1). 35 S-labeled GAGs 
were ethanol precipitated (35) and then isolated by DEAE chromatography as previously 
described (26). 
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Identification of the Site of Sulfation on HS act and HS™* The DEAE eluates containing 
35 S-labeled polysaccharide were vacuum concentrated to 1/5 volume, then desalted at a flow rate 
of 0.9 ml/min on TSK G3000 PWxl (0.78 x 30 cm) and TSK G2500 PWxl (0.78 x 30 cm) 
(TosoHaas) columns connected in series equilibrated in 0.1 M ammonium bicarbonate. The 

5 desalted product was then affinity fractionated using AT/ConA gel to obtain HS acl and HS""* as 
described previously (26). Analysis of labeled products by treatment with GAG lyases and low 
pH nitrous acid were performed as previously described (42). In addition, the HS** and HS" 1 ** 
samples were each subjected to hydrazinolysis, high pH nitrous acid (pH 5.5), low pH nitrous acid 
(pH 1 .5), and sodium borohydride reduction with the resultant disaccharides characterized on 

10 reverse phase ion pairing HPLC (RPIP-HPLC) as previously reported (33, 34). The identification 
of [ 35 S]GlcA->AMN-3-0-SO 3 and [ 35 S]GlcA^AMN-3,6-CKS0 3 ) 2 was confirmed by co- 
chromatography on RPIC-HPLC with the appropriate 3 H-labeled disaccharide standards, as 
described in prior publications (33,34). 
Northern Blot Analysis 

15 Total RNA from RFPEC and primary mouse CME cells was isolated by the method of 

Chomczynski and Sacchi (51), whereas poly(A)+ RNA was isolated from HUVEC cells as 
described above for LTA cells. Total RNA from the mast cell line CI.MC/C57. 1 (C57. 1) (52) 
was from Dr. Stephen J. Galli (Beth Israel Hospital). Samples were resolved on 1 .2% 
formaldehyde-agarose gels and subjected to Northern blot analysis as previously described (50). 

20 Mouse and human samples were hybridized with mouse or human probes, respectively, and 

washed as described for library screening, above, except hybridizations were performed at 60 °C. 
Peptide Sequencing and PGR Generation of a Mouse 3 -O-Sulfotransferase- 1 (3-OST-D Probe 

The information necessary for the molecular cloning of mouse heparan sulfate D- 
glucosaminyl 3 -0-sulfotransferase- 1 (3-OST-l) was obtained by sequencing the amino terminus 

25 and Lys-C generated peptides of the enzyme that we had previously purified from large quantities 
of serum-free tissue culture medium conditioned by an L cell line (26). These studies established 
the structures of 14 partially overlapping peptides which encompass 185 amino acid residues. 
Degenerate PCR primers were synthesized based on the sequence of the amino terminus (primer 
IS) and two endopeptidase derived fragments (primers 2S, 2A, and 3 A). When PCR was 

30 performed on an LTA first strand cDNA template, products of about 210 (primers 1S/2A) and 
780 (primers 1S/3A) and 610 (primers 2S/3A) bp were obtained, which suggests that all of the 
primer sites are contained within a single cDNA. To confirm this supposition, the two largest 
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fragments were cloned into pCR-Script Amp SK(+) and inserts were sequenced, which revealed 
that the 1 S/3 A product is 779 bp and contains the 6 1 1 bp 2S/3A product. The 779 bp insert 
encodes 12 of the sequenced peptide fragments and so was 32 P-labeled, as described above, and 
used as a probe for cDNA library screening. 

5 Isolation and Characterizati on of Mouse 3-OST-l cDNAs 

An amplified X Zap Express LTA cDNA library of 1 .5 x 1 0 6 primary recombinants was 
constructed and 1.3 x 10 6 plaques were screened with the above described probe, which revealed 
40 positives that were plaque purified and in vivo excised into plasmids. The cDNA inserts of 
each plasmid were characterized to eliminate duplicated recombinants due to library amplification. 

10 Size was determined by liberating cDNA inserts with digestion at flanking £coRI and Xhol 

restriction sites followed by agarose gel electrophoresis; furthermore, the sequence at both ends 
of each insert was obtained from flanking vector primer sites. This analysis revealed 25 unique 
primary recombinants which predominantly contained inserts of approximately 1.7, 2.3, or 3.3 kb. 
These different species were considered to reflect natural size variants of the mouse message since 

15 northern blots of LTA poly(A) + RNA hybridized with 3-OST-l probe revealed the same three size 
categories of message. The complete sequencing of 9 distinct primary recombinants, at least 2 
from each size category, in conjunction with the partial sequencing of the remaining 16 clones 
showed that the size variants result from differences in the length of 5 1 untranslated region due to 
the insertion of 0-1629 bp at a single common internal point, the splice variant site. Most 

20 importantly, all clones shared identical protein coding regions and, therefore, the characterization 
and analysis of only the shortest species, the Class 1 cDNA, which lacks additional sequence at 
the splice variant site, is described below. 

Sequence data was obtained from 2 essentially full length Class 1 cDNAs, and 5 partial 
length cDNAs to create a composite cDNA structure of 1685 bp (SEQ ID NO: 1), excluding the 

25 3* poly(A) tract. The 5' untranslated region is 322 bp with the splice variant site occurring 
between nucleotides 216 and 217. This region contains 6 ATG sites which do not conform to 
consensus initiation sites (53) and are followed by near in-frame termination codons. An open 
reading frame of 933 bp begins at position 323 with the first consensus initiation ATG (a purine 
occurs at -3) (53). The length of the 3' untranslated region from all of the cDNA clones analyzed 

30 ranged from 301-430 bp. Within this terminal 129 bp, 5 distinct polyadenylation sites were 
observed and 13-18 bp upstream from each site is a variant of the consensus polyadenylation 
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signal. Poly(A) tails were most frequently observed at the first site (position 1556, -50% of 
clones). 

Isolation and Characterization of Huma n 3-OST-l cDNAs 

Three clones containing partial length human 3-OST-l cDNAs were identified by EST 

5 database searching (48) and were obtained from the TIGR/ATCC Special Collection, as described 
above. Sequencing of the insert ends revealed the clones to be essentially equivalent, as each 
contained the same 947 bp region of the human 3-OST-l cDNA. The insert of I.M.A.G.E. 
Consortium ClonelD 220372 was 32 P-labeled and used to screen 5 x 10* plaques from a X TriplEx 
Brain cDNA library. Three positives were identified and isolated as TriplEx plasmids, and the 

10 largest cDNA 1.3 kb was sequenced completely. 

The nucleic acid sequence of mouse and human 3-OST-l cDNAs are -85% identical. The 
largest isolated human clone contains 1 1 8 bp of 5' untranslated region with 2 nonconsensus ATG 
sites. The sequences of human and mouse cDNAs flanking the splice variant site on the 5' limit 
are distinct (positions 21 1-216 of SEQ ID NO: 1 and positions 5-10 of SEQ ID NO: 3), but on 

15 the 3* limit are identical (positions 217-222 of SEQ ID NO: 1 and positions 1 1-16 of SEQ ID NO: 
3), which raises the possibility that human 3-OST-l mRNA may also exhibit 5* splice variants. 
The first consensus ATG (with a purine occurring at -3 and a G at +4) (53) initiates an open 
reading frame of 921 bp. For all 4 human cDNA clones examined, only a single polyadenylation 
site was observed resulting in a 3 1 untranslated region of 266 bp, which is 26 bp less than the most 

20 frequently observed 3' limit for the mouse cDNAs. 

Predicted Protein Structures of Mouse and Hum an 3-OST-l 

The mouse and human cDNAs encode novel 3 1 1 and 307 amino acid proteins of 35,876 
and 35,750 daltons, respectively, that exhibit 93% similarity. The deduced mouse primary 
structure contains regions corresponding to all 13 sequenced peptides and the amino terminus. 

25 For both types of 3-OST-l, the encoded protein is predicted to be an intraluminal resident. Kyte- 
Doolittle hydropathy analysis reveals only a single major hydrophobic region which begins at the 
amino terminus and lacks sufficient length for a membrane spanning domain. Moreover, the 
hydrophobic region differs from a membrane anchor in that it contains two glutamine residues and 

( cr- 

is not flanked by cationic residues. Thus, the above stretch of 1 8 residues constitutes a 
30 hydrophobic leader signal, and this region is followed by a signal peptidase cleavage site between 
amino acids 20 and 21, as determined by the method of von Heijne (54). The possibility of signal 
peptidase cleavage is supported by the ammo-terminal analysis of mouse 3-OST-l, which began 
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with His 21 . Given that heparan biosynthesis is considered to occur in the /raras-Golgi, the above 
data suggest that the 3-OST-l is an intraluminal enzyme. Just past the signal peptidase cleavage 
site, the mouse 3-OST-l contains an extra 4 residues (Asp^-Pro^-Gly^-Pro 27 * not found in the 
human form. Both 3-OST-l proteins exhibit 5 potential 7V-glycosylation sites which account for 

5 the apparent discrepancy between the molecular weights of the predicted amino terminus trimmed 
enzyme (-34 kDa) and the previously purified enzyme (a broad band of 46 kDa was observed on 
SDS-PAGE) (26). Only two cysteine residues are present, and these closely spaced residues are 
likely to form a disulfide bond which generates a peptide loop of 10 amino acids. Interestingly, 
the carboxy 140 residue region is extremely basic (25% H, K, R; 12% E, D); however, this region 

1 0 does not exhibit previously recognized heparin binding motifs. 

Recombinant Expression of Mouse and Human 3-OST-l Enzyme fr3-OST-l^ 

Three distinct expression approaches were employed to confirm that the isolated cDNAs 
encode 3-OST-l enzyme. The resulting recombinantly expressed 3-OST-l enzyme was 
designated as r3-OST-l, to distinguish this form from the previously purified native 3-OST-l 

15 enzyme. First, the vector pCMV-3-OST (a pcDNA3 derivative in which the CMV promoter 

transcribes the mouse 3-OST-l cDNA) was transiently expressed in COS-7 cells and the resulting 
level of HS acl conversion activity accumulated in Serum-Free Medium over 32 h was measured, as 
described above. HS** conversion activity is a 3-OST-l catalyzed reaction which requires 
unlabeled PAPS to convert ^S-HS^ into 35 S-HS act . Before or after pcDNA3 transfection, 

20 typically COS-7 conditioned Serum-Free Medium contained a low but detectable amount ofHS"* 
conversion activity, whereas transfection by pCMV-3-OST elevated levels -2,000-fold. 

Second, to exclude the remote possibility that the expression of the mouse 3-OST-l 
cDNA indirectly induces, rather than directly encodes, HS** conversion activity, a Protein A/3- 
OST-1 fusion protein was analyzed. COS-7 cells were transiently transfected with pCMV- 

25 ProA3-OST, a pCMV-3-OST derivative in which the amino-terminal 26 residues of the mouse 3- 
OST-1 are replaced with a Protein A tag, and Protein A tagged mouse r3-OST-l was extracted 
with IgG agarose beads from 1 55 ml of conditioned Serum-Free Medium, as described above. 
The affinity purification recovered undetectable and less than 0.5% of initial HS* ct conversion 
activity from control pcDNA3 and pCMV-3-OST transfection samples, respectively, whereas 

30 -7,000 units (10% recovery) were extracted from p CMV-Pro A3 -O ST transfection samples. 
Thus, the mouse 3-OST-l cDNA directly encodes HS"* conversion activity. 
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Third, the activities of cell-free synthesized mouse and human r3-OST-l were examined. 
Synthetic capped mouse and human 3-OST-l mRNAs were generated by in vitro transcription 
and then in vitro translated with reticulocyte lysate in the presence and absence of canine 
pancreatic microsomal membranes, as described above. HS act conversion activity was 

5 undetectable in the control in vitro translation reactions which lacked mRNA template, with or 
without microsomal membranes. A low level HS act conversion activity resulted from the addition 
of synthetic 3-OST-l mRNA templates to translation reactions lacking microsomal membranes 
(mouse, 0.86 ± 0.028 units/jil, n = 3; human, 2. 1 ± 0.063 units/jil, n = 3); however, -15-fold 
greater levels occurred when microsomal membranes were included in translation reactions 

10 (mouse, 14.3 ± 0.27 units/jil, n = 3; human, 32.4 ± 2. 1 units/jil, n = 3). The apparent activation 
of nascent r3-OST-l by co-translational processing within microsomes may result from signal 
peptidase cleavage, TV-linked glycosylation, and/or a facilitation of correct protein folding. The 
slightly greater production from the human 3-OST-l cDNA may reflect the more favorable 
context of the human initiation codon, or the reduced length of the human 5* untranslated region. 

15 Independent of the above considerations, the above data confirm that isolated mouse and human 
cDNAs encode HS"* conversion activity. 

Next, the biochemical specificity of the HS** conversion activity generated from each 
expression approach was examined by incubating crude or purified enzyme with [ 35 S]PAPS and 
unlabeled HS" 1 **, recovering radiolabeled GAG by DEAE chromatography and characterizing the 

20 resultant products. The HS** conversion activity of the wild-type mouse r3-OST-l produced by 
transfecting COS-7 cells with pCMV-3-OST (1.35 x 10 6 units in 240 ml of conditioned Serum- 
Free Medium) was first purified away from potential contaminating sulfotransferase activities by 
heparin-AF Toyopearl chromatography followed by 3',5-ADP-agarose chromatography, which 
yielded ~1 jig of protein containing 340,000 units (~20,000-fold purification with 25% overall 

25 recovery); whereas, the IgG agarose-purified Protein A tagged r3-OST-l and in vitro translation 
reactions of mouse and human 3-OST-l mRNA templates were directly analyzed, as described 
above. About 0.5 - 1 x 10 6 cpm of product was generated with purified wild-type r3-OST-l, 
purified Protein A tagged r3-OST-l, and nonpurified in vitro translation reactions containing 
mouse and human r3-OST-l, respectively. Portions of each labeled product were incubated with 

30 purified heparitinase (0.5 units/ml) or chondroitinase ABC (0.5 units/ml) and HPLC-GPC analysis 
indicated that in all cases label was exclusively incorporated into HS. Portions of the labeled HS 
samples were also JV-desulfated with nitrous acid at pH 1 .5, and analyzed by P-2 polyacrylamide 
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gel filtration to determine the amounts of liberated free [ 35 S]sulfate, as described above. The 
results demonstrated no increased generation of free [ 35 S]sulfate. Finally, portions of the labeled 
samples were AT affinity fractionated, which revealed that in each case -40% of the 35 S-label was 
incorporated in HS act and approximately --60% of the 35 S-label was incorporated in US****. The 

5 labeled HS tct and US"** generated by the wild-type purified r3-OST-l were chemically cleaved to 
disaccharides with nitrous acid treatment, appropriate 3 H-labeled disaccharides standards were 
added, and the 35 S- and 3 H-labeled species were coresolved by RPIP-HPLC as outlined above. 
The results show that the 35 S-label coelutes with [ 3 H]acA->AMN-3-0-S0 3 and 
[ 3 H]GlcA->AMN-3,6-0-(S03)2, respectively. This approach also revealed that Protein A tagged 

10 r3-OST-l, and in vitro translation derived mouse and human r3-OST-l generated 35 S-HS which 
only contained 35 S-labeled disaccharides that coeluted with [ 3 H]GlcA-*AMN-3-0-S0 3 and 
[ 3 H]GlcA^AMN-3,6-0-(S0 3 )2, respectively. It was previously shown that 35 S-labeled 
GlcA-»AMN-3,6-0-(SO3)2 generated by purified 3-OST-l enzyme contains 35 S solely in the 3-0- 
position (26). Thus, the expressed HS** conversion activities exclusively catalyze the transfer of 

15 sulfate to the 3-0- position of glucosamine units in HS act and HS"** 
Northern Analysis of Rodent and Human 3-OST-l Expression 

Northern blot analysis reveals the presence of 3-OST-l message in different kinds of 
endothelial cells as well as a mast cell line. Both cell types have previously been shown to form 
HS** and anticoagulant heparin, respectively (6, 8, 55). Three size categories of rodent 3-OST-l 

20 mRNA (about 1 .7, 2.3, 3.3 kb) and a single size species of the human message (about 1 .7 kb) 
were evident. As described above, the mouse forms arise from differential splicing within the 5 1 
untranslated region. Similar size categories are also expressed by rat (RFPEC) endothelial cells, 
suggesting a similar mechanism of origin. The abundance of each category varies with each cell 
line, which suggests that a mechanism mists to regulate such differential splicing. The 

25 immortalized mouse mast cell line, C57. 1, expresses high levels of the same three size categories, 
which suggests that expression of a single 3-OST-l gene is required for the synthesis of both 
HS act and anticoagulant heparin. 

The 3-OST-l Sequence Defines a Heparan Sulfotransferase Family 

Extensive computer-aided data bank searching revealed the 3-OST-l protein to be a 
30 previously unidentified protein; furthermore, the carboxy-terminal 250 residues exhibit a low 
homology (-30% similarity) to many previously identified sulfotransferases (which are typically 
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-300 residues in length) including chondroitin-, aiyWphenol-, ^-hydroxy arylamine-, alcohol* 
/hydroxysteroid-, flavonol-, and nodulation factor sulfotransferases. We also observed a slightly 
greater homology (-40% similarity) to a functionally unidentified open reading frame of 247 
amino acids from Aeromonas salmonicida (GenBank accession number L37077). More 

5 importantly, the 3-OST-l protein exhibits -50% similarity with all previously identified forms of 
the heparan biosynthetic enzyme N-deacetylase/N-sulfotransferase (NST). In particular, extensive 
homology exists across the entire 250-270 caifcoxy-terminal residues of these enzymes. Thus, it 
appears that a common sulfotransferase structure is shared by two distinct types of heparan 
biosynthetic enzyme. Given that NST is a Afunctional enzyme, the above observation suggests 

10 that NST enzymes possess sulfotransferase activity within a -270 residue carboxy-terminal 
domain, whereas deacetylase activity would be contained within the remaining -560 luminal 
residues. Interestingly, the region of consensus Lys 302 -Arg 323 , which encompasses the 
presumptive cysteine bridged peptide loop (described above), exhibits complete conservation for 
12 of the 22 residues (including both cysteines) among all 3-OST-l and NST species. 

15 Identification and molecular cloning of 3-OST-2. 3-OST-3 A. 3-OST-3B and 3-OST-4 

The 3-OST-l protein exhibits a COOH-terminal region of -260 residues which was 
determined to be a sulfotransferase (ST) domain based on homology to all known 
sulfotransferases. The National Center for Biotechnology Information data bank of expressed 
sequence tags (ESTs) was searched with amino acid sequences of the ST domain from the human 

20 3-OST-l cDNA to reveal seven human cDNAs encoding three novel related species. The forms 
were subsequently designated as 3-OST-2 (I.MA.G.E. Consortium (LLNL) ClonelD c-20dl0), 
3-OST-3 (Clone ID 284542) and 3-OST-4 (Clone IDs HIBCX69 , IB727, 166466, 23279, and c- 
3ie01). These EST clones were obtained from the TIGR/ATCC Special Collection, and the 
inserts were completely sequenced, revealing that all clones were of partial length. 

25 To obtain full length clones, isoform specific probes were generated from the EST clones 

and used to screen X TriplEx human cDNA libraries. 7 and 4 additional 3-OST-2 and 3-OST-4 
cDNAs were isolated from a brain library, and 8 new 3-OST-3 cDNAs were recovered from a 
liver library. The cDNA inserts were completely sequenced, revealing the full length form for 3- 
OST-2 as well as 2 distinct full length forms for 3-OST-3 (3-OST-3A and 3-OST-3B). The 

30 additional 3-OST-4 clones were also of partial length. 

3-OST-2. 3-OST-3A. 3-OST-3B and 3-OST-4 Protein Structures and Activities 
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The 3-OST-2, 3-OST-3A, and 3-OST-3B proteins are 367, 406, and 390 amino acids in 
length, respectively. All three proteins conform to the architecture of a type-II integral membrane 
protein. These proteins and the partial length 3-OST-4 share a common (85% similarity) ST 
domain region of -260 amino acid at their COOH-terminus. To characterize the encoded HS 

5 sulfotransferase activities, the 3-OST-2, 3-OST-3 A, and 3-OST-3B cDNAs were individually 
expressed in COS-7 cells. 

The analysis of transfected cell extracts demonstrated that each enzyme transfers sulfate 
specifically to the 3-0 position of glucosamine residues within HS; however distinct specificities 
occur. 3-OST-2 preferentially sulfates regions containing GlcA 2S->GlcNS to generate 

10 GlcA 2S-»GlcNS 3S; whereas both 3-OST-3A, and 3-OST-3B recognize regions with 
IdoA 2S->GlcNS to generate IdoA 2S-»GlcNS 3S. 
Expression Patterns Indicate Biological Function 

The biologic function of these novel enzymes was elucidated by performing northern blot 
analysis. 3-OST-4 is exclusively expressed in the brain, whereas 3-OST-2 mRNA predominantly 

15 occurs in the brain with minor levels also found in heart, lung, skeletal muscle and placenta. 3- 
OST-3 forms occur in virtually all tissues but with barely detectable levels in brain, low levels in 
heart, lung, skeletal muscle and kidney, and extremely abundant expression in liver and placenta. 
Thus 3-OST-2 and 3-OST-4 appear to be the brain counterparts of 3-OST-3. The product of 3- 
OST-3 (IdoA 2S-»GlcNS 3S) has previously been shown to be extremely abundant in HSPGs 

20 isolated from the glomerular basement membrane (GBM) of the kidney. These HSPGs are critical 
to regulating the permselectivity of the GBM. This function occurs through interactions with 
extracellular matrix components that regulate the pore size of the matrix. Given that the liver, 
placenta, and kidney glomerulus are all responsible for the filtration of macromolecular 
components from blood and all exhibit high 3-OST-3 expression, it appears that 3-OST-3 serves a 

25 common function in each situation: to regulate macromolecular permeability. In this functional 
regard, the high brain expression of 3-OST-2 and 3-OST-4 correlates with the major molecular 
permeability barrier of the central nervous system, the blood brain barrier. 
Therapeutic Utilities 

The 3-OST heparan biosynthetic enzymes may be generated by recombinant expression of 
30 the isolated cDNAs to generate novel glycosaminoglycan drugs of specific structure through an in 
vitro biochemical synthesis approach. Specifically, 3-OST-l may be used to generate 
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anticoagulant pentasaccharides, which may be administered subcutaneously to treat thrombotic 
disorders such as deep vein thrombosis and pulmonary embolism. The 3-OST-l enzyme may also 
be used to generate an orally absorbable form of pentasaccharide from an appropriate 
carbohydrate substrate linked to a hydrophobic group. In an analogous fashion, specific 

5 glycosaminoglycan products may be generated from 3-OST-2, 3-OST-3 and 3-OST-4, which may 
be used as therapeutics to alter macromolecular permeability of various vascular beds. Drugs 
which reduce capillary permeability may, at the very least, be used to treat (1) microproteinurea 
and macroproteinurea of renal diseases including diabetic nephropathy and the various forms of 
glomerulonephrititis; (2) neoplastic growths by limiting nutrient supply to tumors; and (3) 

10 inflammatory diseases were macromolecular constituents of the plasma are required for initiating 
and maintaining a localized inflammation. Conversely, drugs which enhance capillary permeability 
may be used (1) as an adjunctive treatment to facilitate pharmacological access to vascular beds, 
which exhibit highly selective drug entry, such as the blood brain barrier and the placental barrier, 
and (2) to enhance nutrient supply to under-perfused tissues such as the myocardium after an 

15 infarct. 

Specific heparan sulfate structures regulate additional biologic processes by interacting 
with numerous protein effector molecules including growth and differentiation factors (e.g., FGF 
family members, HB-EGF, HGF/SF, interferon y, PDGF, SDGF, and VEGF/VPF), chemokines 
(e.g., MIP-10, RANTES, and GRO), receptors (e.g., TGF-p receptors), mast cell proteases, 

20 protease inhibitors (e.g., AT, heparin cofactor n, leuserpin, plasminogen activator inhibitor- 1, 
protease nexins), degradative enzymes (e.g., elastase, acetylcholinesterase, extracellular 
superoxide dismutase, thrombin, tissue plasminogen activator, lipoprotein lipase, hepatic and 
pancreatic triglyceride lipase, and cholesterol esterase), apolipoproteins (e.g., apoB and apoE), 
matrix components (e.g., fibronectin, wnt-1, interstitial collagens, laminin, pleiotropin, tenascin, 

25 thrombospondin, and vitronectin) viral coat proteins (e.g., gC and gB of HSV types I and n, gC- 
II of CMV, and gpl20 of HIV), nuclear proteins (e.g., c-fos, c-jun, RNA and DNA polymerases, 
and steroid receptors), cellular adhesion molecules (e.g., L-selectin, P-selectin, PECAM-1, and N- 
CAM) and other molecules (e.g., HB-GAM/pleiothrophin, amphoterin, and PF4). 

Using routine methods (e.g., site-directed mutagenesis) the available 3-OST cDNAs may 

30 be selectively mutated to alter substrate recognition properties so as to produce enzymes that 
generate novel glycosaminoglycan structures which modulate the biologic processes regulated by 
the above effector molecules. Thus, novel drugs may also be biochemically synthesized from 
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recombinantly expressed mutated enzymes. Such substances may serve to (1) enhance growth or 
regeneration of specific cell types such as the endothelial cells of the heart after infarction, or 
neurons in neurodegenerative diseases; (2) suppress undesirable cell growth in conditions such as 
cancer (either directly by acting on the cancers cells or indirectly by preventing endothelial cells 
from neovascularizing the tumor), atherosclerosis (by preventing smooth muscle cell growth), and 
inflammatory diseases characterized by cellular proliferation; (3) prevent metastasis of tumors by 
modulating cell/matrix interactions; (4) reduce the destructive side effects of inflammatory 
reactions by inhibiting degradative enzymes or by activating inhibitory molecules (e.g. protease 
inhibitors) which may be directly or indirectly protective by limiting extravasation of lymphocytes; 
(5) modulate serum lipid levels by enhancing or reducing the cellular or tissue uptake or 
degradation of specific lipoprotein classes; (6) treat viral infections by preventing viral entry into 
cells; and (7) facilitate axon regeneration subsequent to nerve severing. 

Bacterial expression of 3-OST-l. The human and mouse 3-OST-l proteins have been 
expressed as active, soluble protein in K coli. This has been achieved using the pET system from 
NOVAGEN (Madison, WI). The human and mouse 3-OST-l cDNA's were PCR amplified with 
pfu DNA polymerase and purified cloned plasmids as template. The primers that were used were 
designed to amplify a cDNA fragment starting, in frame, after the native signal sequence and 
including the native translational termination codon. Additionally, the PCR primers were 
designed to include restriction sites that would facilitate cloning into the vectors described below 
in the correct transcriptional/translational reading frames. 3-OST-l was cloned into vectors 
pET12a, 15B and 28a according to the manufacturer's instructions. This places the 3-OST-l 
cDNA downstream of a powerful, inducible T7 transcription site and includes an efficient Shine- 
Dalgarno sequence at the appropriate distance from the initiator methionine of the construct. 

Good yields of active protein result from IPTG induction at room temperature. The 
specific activity appears to be less than purified, or Baculovirus/sf9 produced material. The exact 
magnitude of the diminution of activity is unclear at this time; however, it may be 10-1000 fold. 
The presently preferred purification scheme is: (1) Induction at 22 °C. (2) Sonicationof 
bacteria, centrifugation to remove inclusion bodies and cell debris, purification of crude bacterial 
sonicate on heparin sepharose as described elsewhere. (3) PAP column chromatography. (4) 
Gel permeation chromatography. Step (4) is only needed for obtaining monomelic, pure 3-OST- 
1, and not for active protein preparation. 
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CLAIMS 

What is claimed is: 

1 1 . An isolated nucleic acid encoding at least a functional fragment of a 3-OST protein. 

1 2. An isolated nucleic acid as in claim 1 wherein said nucleic acid encodes a 3-OST protein 

2 comprising a mature 3-OST-l protein selected from the group consisting of mature murine 3- 

3 OST-1 and mature human 3-OST-l . 

13. An isolated nucleic acid as in claim 1 wherein said nucleic acid encodes a 3-OST protein 

2 comprising a protein selected from the group consisting of 3-OST-l, 3-OST-2, 3-OST-3A, 3- 

3 OST-3B, 3-OST-4, and ce3-OST. 

1 4. An isolated nucleic acid as in claim 1 wherein said nucleic acid encodes a 3-0- 

2 sulfotransferase domain of a 3-OST protein selected from the group consisting of 3-OST-l, 3- 

3 OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, ce3-OST. 

1 5. An isolated nucleic acid as in claim 1 wherein said nucleic acid comprises a nucleotide 

2 sequence selected from nucleotide sequences within: 

3 (a) SEQIDNO: 1; 

4 (b) SEQIDNO: 3; 

5 (c) SEQIDNO: 5; 

6 (d) SEQIDNO: 7; 

7 (e) SEQIDNO: 9; 

8 (f) SEQIDNO: 11; 

9 (g) a sequence having at least 60% nucleotide sequence identity with at least one of 

10 (a)-(f) and encoding a functional fragment having sequence-specific HS binding affinity or 3-0- 

1 1 sulfotransferase activity; and 

12 (h) a sequence differing from a sequence of (a)-(g) only by the substitution of 

13 synonymous codons. 

16. An isolated nucleic acid as in claim 1 wherein said nucleic acid comprises a nucleotide 

2 sequence encoding a polypeptide selected from the group consisting of: 

3 (a) residues 21-52, 260-269, 250-276, 53-31 1, or 21-307 of SEQ ID NO: 2; 

4 (b) residues 21-48, 256-265, 246-272, 49-307, or 21-303 of SEQ ID NO: 4; 

5 (c) residues 42-109, 313-325, 303-332, or 110-367 of SEQ ID NO: 6; 
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6 (d) residues 44-147, 351-363, 341-370, or 148-406 of SEQ ID NO: 8; 

7 (e) residues 66-132, 336-348, 326-355, or 133-390 of SEQ ID NO: 10; 

8 (f) residues 396-408, 386-4150, or 207-456 of SEQ ID NO: 12; 

9 (g) residues 240-250, 230-257, 23-291 of SEQ ID NO: 15; 

10 (h) a sequence having at least 60% amino acid sequence similarity with at least one of 

1 1 (a)-(g) and encoding a functional fragment having sequence-specific HS binding affinity or 3-0- 

12 sulfotransferase activity; and 

13 (i) a sequence comprising a chimera of at least two of sequences (a)-(h). 

1 7. An isolated nucleic acid comprising at least 16 consecutive nucleotides of a nucleotide 

2 sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, 

3 SEQ ID NO: 7, SEQ ID NO: 9, and SEQ ID NO: 1 1. 

1 8. A host cell transformed with a nucleic acid of any one of claims 1-7, or a descendant 

2 thereof. 

1 9. A host cell as in claim 8 wherein said host cell is selected from the group consisting of 

2 bacterial cells, yeast cells, and insect cells. 

1 10. A host cell as in claim 8 wherein said host cell is selected from the group consisting of 

2 somatic cells, fetal cells, embryonic stem cells, zygotes, gametes, germ line cells, and transgenic 

3 animal cells. 

1 11. A host cell as in claim 8 wherein said cell is a mammalian cell. 

1 12. A host cell as in claim 1 1 wherein said cell is selected from the group consisting of. COS-7 

2 cells, CHO, murine primary cardiac microvascular endothelial cells (CME), murine mast cell line 

3 C57. 1, human primary endothelial cells of umbilical vein (HUVEC), F9 embryonal carcinoma 

4 cells, rat fat pad endothelial cells (RFPEC), L cells, and cells derived from the transgenic animals 

5 of the invention. 

1 13. A substantially pure protein preparation comprising at least a functional fragment of a 3- 

2 OST protein. 

1 14. A substantially pure protein preparation as in claim 13 wherein said 3-OST protein is 

2 selected from the group consisting of mature murine 3-OST- 1 and mature human 3-OST-l . 
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1 15. A substantially pure protein as in claim 13 wherein said 3-OST protein is selected from the 

2 group consisting of 3-OST-l, 3-OST-2, 3-OST-3A, 3-OST-3B, 3-OST-4, and ce3-OST. 

1 16. A substantially pure protein preparation as in claim 1 3 wherein said functional fragment 

2 comprises a 3-O-sulfotransferase domain of a 3-OST protein selected from the group consisting 

3 of 3-OST-l, 3-OST-2, 3-OST-3A 3-OST-3B, 3-OST-4, and ce3-OST. 

1 17. A substantially pure protein preparation as in claim 13 wherein said functional fragment 

2 comprises an amino acid sequence selected from amino acid sequences within: 

3 (a) SEQIDNO:2; 

4 (b) SEQIDNO:4; 

5 (c) SEQIDNO:6; 

6 (d) SEQIDNO: 8; 

7 (e) SEQIDNO: 10; 

8 (f) SEQIDNO: 12; 

9 (g) SEQIDNO: 15; 

10 (h) a sequence having at least 60% amino acid similarity with at least one of (a)-(g) and 

1 1 having sequence-specific HS binding affinity or 3-O-sulfotransferase activity; and 

12 (i) a sequence comprising a chimera of at least two of sequences (a)-(h). 

1 18. A substantially pure protein preparation as in claim 1 3 wherein said functional fragment 

2 comprises an amino acid sequence selected from the group consisting of: 

3 (a) residues 21-52, 260-269, 250-276, 53-31 1, or 21-307 of SEQ ID NO: 2; 

4 (b) residues 21-48, 256-265, 246-272, 49-307, or 21-303 of SEQ ID NO: 4; 

5 (c) residues 42-109, 3 13-325, 303-332, or 1 10-367 of SEQ ID NO: 6; 

6 (d) residues 44-147, 35 1-363, 341-370, or 148-406 of SEQ ID NO: 8; 

7 (e) residues 66-132, 336-348, 326-355, or 133-390 of SEQ ID NO: 10; 

8 (f) residues 396-408, 386-415, or 207-456 of SEQ ID NO: 12; 

9 (g) residues 240-250, 230-257, 23-291 of SEQ ID NO: 15; 

10 (h) a sequence having at least 60% amino acid sequence similarity with at least one of 

1 1 (a)-(g) and encoding a functional fragment having sequence-specific HS binding affinity or 3-0- 

12 sulfotransferase activity; and 

13 (t) a sequence comprising a chimera of at least two of sequences (a)-(h). 
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19. A method of 3-O-sulfating saccharide residues within a preparation of glycosaminoglycan 
or proteoglycan polysaccharides comprising: 

contacting said preparation with at least a 3-O-sulfotransferase domain of a 3-OST protein 
in the presence of a sulfate donor under conditions which permit sulfation of said residues; 

wherein, said 3-OST protein is selected from the group consisting of 3-OST-l, 3-OST-2, 
3-OST-3A, 3-OST-3B, 3-OST-4, ce3-OST, and conservative substitution variants or chimeras 
thereof. 

20. A method of 3-O-sulfating saccharide residues within a preparation of glycosaminoglycan 
or proteoglycan polysaccharides, wherein said polysaccharides include a polysaccharide sequence 
of GlcA-*GlcNS ±6S comprising: 

contacting said preparation with a 3-OST-l protein in the presence of a sulfate donor 
under conditions which permit said 3-OST-l to convert said GlcA->GlcNS ±6S sequence to 
GlcA->GlcNS 3S ±6S. 

wherein the 3-OST-l protein is selected from the group consisting of murine 3-OST-l, 
human 3-OST-l, mature murine 3-OST-l, mature human 3-OST-l, a functional fragment of a 3- 
OST-1 having 3-O-sulfotransferase activity, a conservative substitution variant of 3-OST-l 
having 3-O-sulfotransferase activity, and a chimeric 3-OST-l having 3-O-sulfotransferase activity. 

21 . A method as in claim 20, wherein said GlcA-»GlcNS ±6S polysaccharide sequence 
comprises a part of a polysaccharide sequence selected from the group consisting of: 

(a) GlcA-KHcNS ±6S-»IdoA 2S-» GlcNS ±6S; 

(b) IdoA->GlcNAc 6S-»GlcA-»GlcNS ±6S->IdoA 2S-* GlcNS 6S; 

(c) IdoA-*GlcNS 6S->GlcA->GlcNS ±6S-»IdoA 2S-> GlcNS 6S; 

(d) IdoA-^GlcNAc-»GlcA--»GicNS ±6S->IdoA 2S-* GlcNS 6S; 

(e) IdoA->QcNS->GlcA->GlcNS ±6S->IdoA 2S-> GlcNS 6S; 

(f) IdoA-KHcNAc 6S-»GlcA->GlcNS ±6S->IdoA 2S-> GlcNS; 

(g) IdoA-^GlcNS 6S->GlcA->GlcNS ±6S-»IdoA 2S-> GlcNS; 

22. A method of 3-O-sulfating saccharide residues within a preparation of glycosaminoglycan 
or proteoglycan polysaccharides, wherein said polysaccharides include a polysaccharide sequence 
of GlcA2S-»GlcNS comprising: 
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contacting said preparation with a 3-OST-2 protein in the presence of a sulfate donor 
under conditions which permit said 3-OST-2 to convert said GlcA 2S->GlcNS sequence to GlcA 
2S->GlcNS 3S. 

wherein the 3-OST-2 protein is selected from the group consisting of 3-OST-2, a 
functional fragment of a 3-OST-2 having 3-O-sulfotransferase activity, a conservative substitution 
variant of 3-OST-2 having 3-O-sulfotransferase activity, and a chimeric 3-OST-2 having 3-O- 
sulfotransferase activity. 

23. A method as in claim 22, wherein said GlcA 2S->GlcNS polysaccharide sequence 
comprises a part of a GlcNS->GlcA 2S-Kj1cNS polysaccharide sequence. 

24. A method of 3-O-sulfating saccharide residues within a preparation of glycosaminoglycan 
or proteoglycan polysaccharides, wherein said polysaccharides include a polysaccharide sequence 
ofIdoA2S-><jlcNS comprising: 

contacting said preparation with a 3-OST-3 protein in the presence of a sulfate donor 
under conditions which permit said 3-OST-3 to convert said IdoA 2S->GlcNS sequence to IdoA 
2S-»GlcNS 3S. 

wherein the 3-OST-3 protein is selected from the group consisting of 3-OST-3 A, 3-OST- 
3B, a functional fragment of a 3-OST-3 having 3-O-sulfotransferase activity, a conservative 
substitution variant of 3-OST-3 having 3-O-sulfotransferase activity, and a chimeric 3-OST-3 
having 3-O-sulfotransferase activity. 

25. A method as in claim 24, wherein said IdoA 2S-»GlcNS polysaccharide sequence 
comprises a part of a GlcNS->IdoA 2S-»GlcNS polysaccharide sequence. 

26. A method for enriching the AT-binding fraction in a preparation of heparan sulfates, 
wherein said preparation includes a polysaccharide sequence of GlcA-»GlcNS ±6S comprising: 

contacting said preparation with 3-OST-l protein in the presence of a sulfate donor under 
conditions which permit said 3-OST-l to convert said GlcA-»GlcNS ±6S sequence to 
GlcA->GlcNS 3S ±6S, thereby increasing the fraction of AT-binding heparan sulfates. 

27. A method for converting HS** precursor to HS acl in a preparation of heparan sulfates, 
wherein said preparation includes HS"* precursor polysaccharides including a sequence of 
GlcA->GlcNS ±6S comprising: 
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4 contacting said preparation with 3-OST-l protein in the presence of a sulfate donor under 

5 conditions which permit said 3-OST-l to convert said GlcA-»GlcNS ±6S sequence to 

6 GlcA-»G»cNS 3S ±6S, thereby converting HS acl precursor to HS acl . 

1 28. A method as in any one of claims 1 9-28 wherein said sulfate donor is PAPS. 

1 29. A non-human animal model, wherein a genome of said animal, or an ancestor thereof, 

2 wherein said recombinant construct has introduced a modification into said genome, said 

3 modification selected from the group consisting of insertion of a nucleic acid encoding at least a 

4 functional fragment of a conspecific wild type 3-OST protein, insertion of a nucleic acid encoding 

5 at least a functional fragment of a transpecific allelic variant of the 3-OST sequences, insertion of 

6 nucleic acid encoding at least a functional fragment of an allelic variant of 3-OST sequence, 

7 inactivation of an endogenous 3-OST gene, and insertion by homologous recombination of a 

8 reporter gene coupled to 3-OST transcriptional elements. 

1 30. An animal as in claim 29 wherein said modification is insertion of nucleic acid encoding at 

2 least a functional fragment of wild type 3-OST selecting from the sequence consisting of the 

3 SPLAG-domain, the cysteine-binding peptide loop, and the -260 residue ST domain. 

1 31. An animal as in claim 29 wherein said animal is selected from the group consisting of rats, 

2 mice, hamsters, guinea pigs, rabbit, dogs, cats, goats, sheep, pigs, and non-human primates. 

1 32. An animal as in claim 29 wherein said animal is an invertebrate. 

1 33. A method of producing antibodies which selectively bind to a 3-OST protein comprising 

2 the steps of 

3 administering an immunogenically effective amount of a 3-OST epitope to an animal; 

4 allowing said animal to produce antibodies to said epitope; and 

5 obtaining said antibodies from said animal or from a cell culture derived therefrom. 

1 34. A substantially pure preparation of antibody which selectively binds to an epitope of a 3- 

2 OST protein. 

1 35. A substantially pure preparation of an antibody as claimed in 34 wherein said antibody 

2 selectively binds to at least a fragment of 3-OST. 

1 36. A cell line producing an antibody of any one of the claims 34. 
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1 37. A method for identifying compounds which can modulate the expression of a 3-OSTgene 

2 comprising steps of 

3 providing a cell expressing a nucleic acid under the control of a 3-OST regulatory 

4 sequence; 

5 contacting said cell with at least one candidate compound; and 

6 assaying for a change in the in the expression of said nucleic acid. 

1 38. The method of claim 37, wherein said nucleic acid comprises a marker gene and a 3-OST 

2 gene 

1 39. The method of claim 37, wherein said assaying step comprises detecting a change in 3- 

2 OSTmRNA level 

1 40. The method of claim 37, wherein said assaying step comprises detecting a change in 3- 

2 OST protein encoded by said nucleic acid. 

1 41. A method of determining partial sequence information for complex polysaccharides 

2 comprising the steps of: 

3 contacting a first sample of polysaccharide with at least one ligand which binds 

4 polysaccharides in a sequence specific manner, 

5 contacting the resulting polysaccharide-ligand complex with at least one agent that 

6 modifies complex polysaccharides; 

7 contacting a second sample of polysaccharide with the same modifying agent; 

8 comparing said first and second samples for ligand-specific inhibition of modifications 

9 caused by said modifying agent. 

1 42. The method of claim 4 1 , wherein said complex polysaccharide is a glycosaminoglycan. 

1 43 . The method of claim 4 1 , wherein said ligand is catalytically inactive, 

i 44. The method of claim 41, wherein said ligand is an inactive 3-OST. 

1 45. The method of claim 41, wherein said agent that modifies polysaccharides is selected from 

2 the group consisting of epimerases, lyases, sulfotransferases, N-acetyltransferases, N- 

3 deacetylases, epimerases. 

1 46. The method of claim 45, wherein said modifying agent is a sequence specific degrading 

2 agent. 

1 47. The method of claim 45, wherein said modifying agent is a non-sequence specific 

2 degrading agent. 
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1 48. The method of claims 46, wherein said degrading agent is a lyase. 

1 49. The method of claim 47, wherein said non-sequence specific degrading agent nitrous acid. 

1 50. The method of claim 45, further comprising affinity purifying said modified first and 

2 second samples. 

1 51. The method of claim 45, wherein the step of comparing includes a comparison of size 

2 profiles. 

1 52. A method of determining partial sequence information for complex polysaccharides 

2 comprising the steps of: 

3 contacting a first sample of complex polysaccharides with a 3-OST protein in the presence 

4 of a sulfate donor under conditions which permit sulfation by said 3-OST; 

5 contacting said first sample and a second sample with at least one enzyme which cleaves 

6 polysaccharides in a sequence-specific manner; 

7 determining the size profiles of the resulting fragments. 

1 53. The method of claim 52, wherein the determining the size profile step further comprises 

2 the step of comparing said first sample to a second sample cleaved by the same enzymes. 

1 54. The method of claim 52, wherein said enzymes which degrade polysaccharides in a 

2 sequence specific manner are selected from the group consisting of polysaccharide lyases, 

3 heparinase I, heparinase n, and heparinase EI 

1 55. A method of determining partial sequence information for a sample containing complex 

2 polysaccharides comprising the steps of: 

3 contacting said sample of polysaccharide with a 3-OST protein which lacks enzymatic 

4 function with a under conditions which permit said 3-OST protein to bind to said polysaccharide 

5 in a sequence specific manner; 

6 applying said sample to an affinity column; 

7 applying degrading agents to said column; 

8 analyzing the resulting degradation products. 

1 56. The method of claim 55, further comprising repeating the steps applying degrading agents 

2 and analyzing using a series of different sequence specific polysaccharide cleavage enzymes. 
3 

1 57. An isolated nucleic acid comprising a 5' untranslated regulatory region of a 3-OST gene 

2 operably joined to a marker gene. 



WO 99/22005 



PCT/US98/22597 



-65- 

58. A host cell transformed with the isolated nucleic acid of claim 57, or a descendent thereof. 

59. A method of identifying compounds capable of modulating the expression of a 3-OST 
gene comprising contacting a candidate compound with the transformed host cell of claim 58 and 
assaying for changes in expression of said marker. 

60. A method as in claim 59, wherein said regulatory region comprise the 5* untranslated 
region of SEQ ID NO: 16. 
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SEQUENCE LISTING 

<110> ROSENBERG, Robert D 
SHWORAK, Nicholas W 
LIU, Jian 

FRITZE, Linda M. S. 
SCHWARTZ, John J 
ZHANG, Lijuan 

Massachusetts Institute of Technology 

<120> HEPARAN SULFATE D - GLUCOSAMINYL 3 -O-SULFOTRANSFBRASES , 
AND USES THEREFOR 

<130> MIT-087PC 

<140> 
<I41> 

<150> USSN 60/065,437 
<151> 1997-10-31 

<150> USSN 60/062,762 
<151> 1997-10-24 

<160> 16 

<170> Patentln Ver. 2.0 

<210> 1 

<211> 1685 

<212> DNA 

<213> Mus musculus 

<220> 
<221> CDS 

<222> (323) . . (1255) 
<223> mouse 3-OST-l 

<400> 1 

tgcattgcaa tgtgaagtgt tcctgaataa acctgcttga agaaggacaa cgtggtgttg 60 

cgtctttcct gctggtcggg gtggaataga cacctcccct ttttaacttg ggtgacctca 120 

tgaacataaa agaacttaaa ggtagcaagc catggactta aagtaggctg accttgaact 180 

cagagatctt cttggcaatg tctctggaga ttaaagtaat tggcaactgg agatactcat 240 

gttccagtaa tcaagaggga gccttgctgc tact tea tga tccaggcgcg tgtggcccag 300 

tgaagtccct gagctgtaca gc atg acc ttg ctg etc ctg ggt gcg gtg ctg 352 

Met Thr Leu Leu Leu Leu Gly Ala Val Leu 
15 io 

ctg gtg gee cag ccc cag ctt gtg cat tec cac ccg get get cct ggc 400 
Leu Val Ala Gin Pro Gin Leu Val His Ser His Pro Ala Ala Pro Gly 
15 20 25 

ccg ggg etc aaa cag cag gag ctt ctg agg aag gtg att att etc cca 448 
Pro Gly Leu Lys Gin Gin Glu Leu Leu Arg Lys Val He He Leu Pro 
30 35 40 

gag gac acc gga gaa ggc aca gca tec aat ggt tec aca cag cag ctg 496 
Glu Asp Thr Gly Glu Gly Thr Ala Ser Asn Gly Ser Thr Gin Gin Leu 
45 50 55 

cca cag acc ate ate att ggg gtg cgc aag ggt ggt acc cga gee ctg 544 
Pro Gin Thr He He He Gly Val Arg Lys Gly Gly Thr Arg Ala Leu 
60 65 70 



WO 99/22005 PCT/US98/22597 

2/25 

eta gag atg etc age ctg cat cct gat gtt get gca get gaa aac gag 592 
Leu Glu Met Leu Ser Leu His Pro Asp Val Ala Ala Ala Glu Asn Glu 
75 80 85 90 

gtc cat ttc ttt gac tgg gag gag cat tac age caa ggc ctg ggc tgg 640 
•Val His Phe Phe Asp Trp Glu Glu His Tyr Ser Gin Gly Leu Gly Trp 
95 100 105 

tac etc acc cag atg ccc ttc tec tec cct cac cag etc acc gtg gag 688 
Tyr Leu Thr Gin Met Pro Phe Ser Ser Pro His Gin Leu Thr Val Glu 
110 115 120 

aag aca ccc gee tat ttc act teg ccc aaa gtg cct gag aga ate cac 736 
Lys Thr Pro Ala Tyr Phe Thr Ser Pro Lys Val Pro Glu Arg lie His 
125 130 135 

age atg aac ccc acc ate cgc ctg ctg ctt ate ctg agg gac cca tea 784 
Ser Met Asn Pro Thr lie Arg Leu Leu Leu lie Leu Arg Asp Pro Ser 
140 145 150 

gag cgc gtg ctg tec gac tac acc cag gtg ttg tac aac cac ctt cag 832 
Glu Arg Val Leu Ser Asp Tyr Thr Gin Val Leu Tyr Asn His Leu Gin 
155 160 165 170 



aag cac aag ccc tat cca ccc att gag gac etc eta atg egg gac ggt 880 
Lys His Lys Pro Tyr Pro Pro lie Glu Asp Leu Leu Met Arg Asp Gly 
175 180 185 

egg ctg aac ctg gac tac aag get etc aac cgc age ctg tac cat gca 928 
Arg Leu Asn Leu Asp Tyr Lys Ala Leu Asn Arg Ser Leu Tyr His Ala 
190 195 200 

cac atg ctg aac tgg ctg cgt ttt ttc ccg ttg ggc cac ate cac att 976 
His Met Leu Asn Trp Leu Arg Phe Phe Pro Leu Gly His lie His He 
205 210 215 

gtg gat ggc gac cgc etc ate aga gac cct ttc cct gag ate cag aag 1024 
Val Asp Gly Asp Arg Leu He Arg Asp Pro Phe Pro Glu He Gin Lys 
220 225 230 

gtc gaa aga ttc ctg aag ctt tct cca cag ate aac gee teg aac ttc 1072 
Val Glu Arg Phe Leu Lys Leu Ser Pro Gin He Asn Ala Ser Asn Phe 
235 240 245 250 

tac ttt aac aaa acc aag ggc ttc tac tgc ctg egg gac agt ggc aag 1120 
Tyr Phe Asn Lys Thr Lys Gly Phe Tyr Cys Leu Arg Asp Ser Gly Lys 
255 260 265 

gac cgc tgc tta cac gag tec aaa ggc egg gcg cac ccc cag gtg gat 1168 
Asp Arg Cys Leu His Glu Ser Lys Gly Arg Ala His Pro Gin Val Asp 
270 275 280 

ccc aaa eta ctt gat aaa ctg cac gaa tac ttt cat gag cca aat aag 1216 
Pro Lys Leu Leu Asp Lys Leu His Glu Tyr Phe His Glu Pro Asn Lys 
285 290 295 

aaa ttt ttc aag etc gtg ggc aga aca ttc gac tgg cac tgatttgccg 1265 
Lys Phe Phe Lys Leu Val Gly Arg Thr Phe Asp Trp His 
300 305 310 

tctcctaggc tegggacttt tcctgttgtt aacttctggt gtacatctga aggggggagg 1325 

aaaataattt taaaaaggca tttaagctat aatttatttg taaaacccac aaatgacttc 1385 

tgtacagtat tagattcaca gttgecatat atagtagtta tatttttcta cttgttaaat 1445 

ggagggcgtt ttgtattgtt tttcatggtt gttaacattg tgtatatgtc tctataatat 1505 
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gaaggaactt aactattgca ctgaaaaaat aagagatttt ttttttctgg agacctcttt 1565 
ttttgttgtt gttgttttaa atataattaa cctgcctcca atccaaaata gctctttgtt 1625 
ttcacctcct tgtcaaatct ataatctttt tctgcttaaa aaatttattg gtattatgga 1685 

<210> 2 

<211> 311 

<212> PRT 

<213> Mus mus cuius 

<400> 2 

Met Thr Leu Leu Leu Leu Gly Ala Val Leu Leu Val Ala Gin Pro Gin 
15 10 15 

Leu Val His Ser His Pro Ala Ala Pro Gly Pro Gly Leu Lys Gin Gin 
20 25 30 

Glu Leu Leu Arg Lys Val lie lie Leu Pro Glu Asp Thr Gly Glu Glv 
35 40 45 

Thr Ala Ser Asn Gly Ser Thr Gin Gin Leu Pro Gin Thr lie lie He 
50 55 60 

Gly Val Arg Lys Gly Gly Thr Arg Ala Leu Leu Glu Met Leu Ser Leu 
65 70 75 80 

His Pro Asp Val Ala Ala Ala Glu Asn Glu Val His Phe Phe Asp Trp 
85 90 95 

Glu Glu His Tyr Ser Gin Gly Leu Gly Trp Tyr Leu Thr Gin Met Pro 
100 105 no 

Phe Ser Ser Pro His Gin Leu Thr Val Glu Lys Thr Pro Ala Tyr Phe 
115 120 125 

Thr Ser Pro Lys Val Pro Glu Arg He His Ser Met Asn Pro Thr He 
130 135 140 

Arg Leu Leu Leu He Leu Arg Asp Pro Ser Glu Arg Val Leu Ser Asp 
145 150 155 160 

Tyr Thr Gin Val Leu Tyr Asn His Leu Gin Lys His Lys Pro Tyr Pro 
165 170 175 

Pro He Glu Asp Leu Leu Met Arg Asp Gly Arg Leu Asn Leu Asp Tyr 
180 185 190 

Lys Ala Leu Asn Arg Ser Leu Tyr His Ala His Met Leu Asn Trp Leu 
195 200 205 

Arg Phe Phe Pro Leu Gly His He His lie Val Asp Gly Asp Arg Leu 
210 215 220 

He Arg Asp Pro Phe Pro Glu He Gin Lys Val Glu Arg Phe Leu Lys 
225 230 235 240 

Leu Ser Pro Gin He Asn Ala Ser Asn Phe Tyr Phe Asn Lys Thr Lys 
245 250 255 

Gly Phe Tyr Cys Leu Arg Asp Ser Gly Lys Asp Arg Cys Leu His Glu 
260 265 270 

Ser Lys Gly Arg Ala His Pro Gin Val Asp Pro Lys Leu Leu Asp Lys 
275 280 285 
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Leu His Glu Tyr Phe His Glu Pro Asn Lys Lys Phe Phe Lys Leu Val 
290 295 9 300 

Gly Arg Thr Phe Asp Trp His 
305 310 



<210> 3 

<211> 1305 

<212> DNA 

<213> Homo sapiens 

<220> 
<221> CDS 

<222> (119) . . (1039) 
<223> human 3-OST-l 

<400> 3 

cgcggctcag taattgaagg cctgaaacgc ccatgtgcca ctgactagga ggcttccctg 60 

ctgcggcact tcatgaccca gcggcgcgcg gcccagtgaa gccaccgtgg , tgtccagc 118 

atg gcc gcg ctg etc ctg ggc gcg gtg ctg ctg gtg gec cag ccc cag 166 
Met Ala Ala Leu Leu lieu Gly Ala Val Leu Leu Val Ala Gin Pro Gin 
1 5 10 15 

eta gtg cct tec cgc ccc gcc gag eta ggc cag cag gag ctt ctg egg 214 
Leu Val Pro Ser Arg Pro Ala Glu Leu Gly Gin Gin Glu Leu Leu Arg 
20 25 30 

aaa gcg ggg ace etc cag gat gac gtc cgc gat ggc gtg gcc cca aac 262 
Lys Ala Gly Thr Leu Gin Asp Asp Val Arg Asp Gly Val Ala Pro Asn 
35 40 45 

ggc tct gcc cag cag ttg ccg cag acc ate ate ate ggc gtg cgc aag 310 
Gly Ser Ala Gin Gin Leu Pro Gin Thr lie lie He Gly Val Arg Lys 
50 55 60 

ggc ggc acg cgc gca ctg ctg gag atg etc age ctg cac ccc gac gtg 358 
Gly Gly Thr Arg Ala Leu Leu Glu Met Leu Ser Leu His Pro Asp Val 
65 70 75 80 

gcg gcc gcg gag aac gag gtc cac ttc ttc gac tgg gag gag cat tac 406 
Ala Ala Ala Glu Asn Glu Val His Phe Phe Asp Trp Glu Glu His Tyr 
85 90 95 

age cac ggc ttg ggc tgg tac etc age cag atg ccc ttc tec tgg cca 454 
Ser His Gly Leu Gly Trp Tyr Leu Ser Gin Met Pro Phe Ser Trp Pro 
100 105 HO 

cac cag etc aca gtg gag aag acc ccc gcg tat ttc acg teg ccc aaa 502 
His Gin Leu Thr Val Glu Lys Thr Pro Ala Tyr Phe Thr Ser Pro Lys 
115 120 125 

gtg cct gag cga gtc tac age atg aac ccg tec ate egg ctg ctg etc 550 
Val Pro Glu Arg Val Tyr Ser Met Asn Pro Ser He Arg Leu Leu Leu 
130 135 140 

ate ctg cga gac ccg teg gag cgc gtg eta tct gac tac acc caa gtg 598 
He Leu Arg Asp Pro Ser Glu Arg Val Leu Ser Asp Tyr Thr Gin Val 
145 150 155 160 

ttc tac aac cac atg cag aag cac aag ccc tac ccg tec ate gag gag 646 
Phe Tyr Asn His Met Gin Lys His Lys Pro Tyr Pro Ser He Glu Glu 
165 170 175 

ttc ctg gtg cgc gat ggc agg etc aat gtg gac tac aag gcc etc aac 694 
Phe Leu Val Arg Asp Gly Arg Leu Asn Val Asp Tyr Lys Ala Leu Asn 
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180 185 190 

cgc age etc tac cac gtg cac atg cag aac tgg ctg cgc ttt ttc ccg 742 
Arg Ser Leu Tyr His Val His Met Gin Asn Trp Leu Arg Phe Phe Pro 
195 200 205 

ctg cgc cac ate cac att gtg gac ggc gac cgc etc ate agg gac ccc 790 
Leu Arg His lie His He Val Asp Gly Asp Arg Leu He Arg Asp Pro 
210 215 220 

ttc cct gag ate caa aag gtc gag agg ttc eta aag ctg teg ccg cag 838 
Phe Pro Glu He Gin Lys Val Glu Arg Phe Leu Lys Leu Ser Pro Gin 
225 230 235 240 

ate aat get teg aac ttc tac ttt aac aaa ace aag ggc ttt tac tgc 886 
He Asn Ala Ser Asn Phe Tyr Phe Asn Lys Thr Lys Gly Phe Tyr Cys 
245 250 255 

ctg egg gac age ggc egg gac cgc tgc tta cat gag tec aaa ggc egg 934 
Leu Arg Asp Ser Gly Arg Asp Arg Cys Leu His Glu Ser Lys Gly Arg 
260 265 270 

gcg cac ccc caa gtc gat ccc aaa eta etc aat aaa ctg cac gaa tat 982 
Ala His Pro Gin Val Asp Pro Lys Leu Leu Asn Lys Leu His Glu Tyr 
275 280 285 

ttt cat gag cca aat aag aag ttc ttc gag ctt gtt ggc aga aca ttt 1030 
Phe His Glu Pro Asn Lys Lys Phe Phe Glu Leu Val Gly Arg Thr Phe 
290 295 300 

gac tgg cac tgatttgcaa taagctaagc tcagaaactt tcctactgta 1079 

Asp Trp His 

305 



agttctggtg tacatctgag gggaaaaaga attttaaaaa agcatttaag gtataattta 1139 
tttgtaaaat ccataaagta cttctgtaca gtattagatt cacaattgcc atatatacta 1199 
gttatatttt tctacttgtt aaatggaggg cattttgtat tgtttttcat ggttgttaac 1259 
attgtgtaat atgtctctat atgaaggaac taaactattt cactga 13 05 



<210> 4 

<211> 307 

<212> PRT 

<213> Homo sapiens 

<400> 4 

Met Ala Ala Leu Leu Leu Gly Ala Val Leu Leu Val Ala Gin Pro Gin 
1 5 10 15 

Leu Val Pro Ser Arg Pro Ala Glu Leu Gly Gin Gin Glu Leu Leu Arg 
20 25 30 

Lys Ala Gly Thr Leu Gin Asp Asp Val Arg Asp Gly Val Ala Pro Asn 
35 40 45 

Gly Ser Ala Gin Gin Leu Pro Gin Thr He He He Gly Val Arg Lys 
50 55 60 

Gly Gly Thr Arg Ala Leu Leu Glu Met Leu Ser Leu His Pro Asp Val 
65 70 75 80 

Ala Ala Ala Glu Asn Glu Val His Phe Phe Asp Trp Glu Glu His Tyr 
85 90 95 

Ser His Gly Leu Gly Trp Tyr Leu Ser Gin Met Pro Phe Ser Trp Pro 



WO 99/22005 PCT/US98/22597 

6/25 

100 105 no 

His Gin Leu Thr Val Glu Lys Thr Pro Ala Tyr Phe Thr Ser Pro Lys 
115 120 125 

Val Pro Glu Arg Val Tyr Ser Met Asn Pro Ser lie Arg Leu Leu Leu 
130 135 140 

He Leu Arg Asp Pro Ser Glu Arg Val Leu Ser Asp Tyr Thr Gin Val 
145 150 155 160 

Phe Tyr Asn His Met Gin Lys His Lys Pro Tyr Pro Ser He Glu Glu 
165 170 * 175 

Phe Leu Val Arg Asp Gly Arg Leu Asn Val Asp Tyr Lys Ala Leu Asn 
180 185 190 

Arg Ser Leu Tyr His Val His Met Gin Asn Trp Leu Arg Phe Phe Pro 
195 200 205 

Leu Arg His He His He Val Asp Gly Asp Arg Leu He Arg Asp Pro 
210 215 220 

Phe Pro Glu He Gin Lys Val Glu Arg Phe Leu Lys Leu Ser Pro Gin 
225 230 235 240 

He Asn Ala Ser Asn Phe Tyr Phe Asn Lys Thr Lys Gly Phe Tyr Cys 
245 250 " 255 

Leu Arg Asp Ser Gly Arg Asp Arg Cys Leu His Glu Ser Lys Gly Arg 
260 265 270 

Ala His Pro Gin Val Asp Pro Lys Leu Leu Asn Lys Leu His Glu Tyr 
275 280 285 

Phe His Glu Pro Asn Lys Lys Phe Phe Glu Leu Val Gly Arg Thr Phe 
290 295 300 

Asp Trp His 
305 

<210> 5 

<211> 1951 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (73) . . (1173) 

<223> human 3-OST-2 

<400> 5 

cgcagggcca cagcagctca gccgccggtg ccccctcgga aaccatgacc cccggcgcgg 60 

gcccatggag cc atg gcc tat agg gtc ctg ggc cgc gcg ggg cca cct cag ill 
Met Ala Tyr Arg Val Leu Gly Arg Ala Gly Pro Pro Gin 
15 10 

ccg egg agg gcg cgc agg ctg etc ttc gcc ttc acg etc teg etc tec 159 
Pro Arg Arg Ala Arg Arg Leu Leu Phe Ala Phe Thr Leu Ser Leu Ser 
15 20 25 

tgc act tac ctg tgt tac age ttc ctg tgc tgc tgc gac gac ctg ggt 207 
Cys Thr Tyr Leu Cys Tyr Ser Phe Leu Cys Cys Cys Asp Asp Leu Gly 
30 35 40 45 

C S9 age cgc etc etc ggc gcg cct cgc tgc etc cgc ggc ccc age gcg 255 
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Arg Ser Arg Leu Leu Gly Ala Pro Arg Cys Leu Arg Gly Pro Ser Ala 
50 55 60 

ggc ggc cag aaa ctt etc cag aag tec cgc ccc tgt gat ccc tec ggg 303 
Gly Gly Gin Lys Leu Leu Gin Lys Ser Arg Pro Cys Asp Pro Ser Gly 
65 70 75 

ccg acg ccc age gag ccc age get ccc age gcg ccc gee gee gee gtg 351 
Pro Thr Pro Ser Glu Pro Ser Ala Pro Ser Ala Pro Ala Ala Ala Val 
80 85 90 

ccc gee cct cgc etc tec ggt tec aac cac tec ggc tea ccc aag ctg 399 
Pro Ala Pro Arg Leu Ser Gly Ser Asn His Ser Gly Ser Pro Lys Leu 
95 100 105 

ggt ace aag egg ttg ccc caa gee etc att gtg ggc gtg aag aag ggg 447 
Gly Thr Lys Arg Leu Pro Gin Ala Leu He Val Gly Val Lys Lys Gly 
110 115 120 " 125 

ggc ace egg gee gtg ctg gag ttt ate cga gta cac ccg gac gtg egg 495 
Gly Thr Arg Ala Val Leu Glu Phe He Arg Val His Pro Asp Val Arg 
130 135 140 

gee ttg ggc acg gaa ccc cac ttc ttt gac agg aac tac ggc cgc ggg 543 
Ala Leu Gly Thr Glu Pro His Phe Phe Asp Arg Asn Tyr Gly Arg Gly 
145 150 155 

ctg gat tgg tac agg age ctg atg ccc agg ace etc gag age cag ate 591 
Leu Asp Trp Tyr Arg Ser Leu Met Pro Arg Thr Leu Glu Ser Gin He 
160 165 170 

acg ctg gag aag acg ccc age tac ttt gtc act caa gag get cct cga 639 
Thr Leu Glu Lys Thr Pro Ser Tyr Phe Val Thr Gin Glu Ala Pro Arg 
175 180 185 

cgc ate ttc aac atg tec cga gac ace aag ctg ate gtg gtt gtg egg 687 
Arg He Phe Asn Met Ser Arg Asp Thr Lys Leu He Val Val Val Arg 
190 195 200 205 

aac cct gtg ace cgt gee ate tct gat tac acg cag aca etc tec aag 
Asn Pro Val Thr Arg Ala He Ser Asp Tyr Thr Gin Thr Leu Ser Lys 
210 215 220 



735 



aag ccc gac ate ccg ace ttt gag ggc etc tec ttc cgc aac cgc ace 783 
Lys Pro Asp He Pro Thr Phe Glu Gly Leu Ser Phe Arg Asn Arg Thr 
225 230 235 

ctg ggc ctg gtg gac gtg teg tgg aac gee ate cgc ate ggc atg tac 831 
Leu Gly Leu Val Asp Val Ser Trp Asn Ala He Arg He Gly Met Tyr 
240 245 250 

gtg ctg cac ctg gag age tgg ctg cag tac ttc ccg eta get cag att 879 
Val Leu His Leu Glu Ser Trp Leu Gin Tyr Phe Pro Leu Ala Gin He 
255 260 265 

cac ttc gtc agt ggc gag cga etc ate act gac ccg gee ggc gag atg 927 
His Phe Val Ser Gly Glu Arg Leu He Thr Asp Pro Ala Gly Glu Met 
270 275 280 ' 285 

ggg cga gtc cag gac ttc ctg ggc att aag aga ttc ate acg gac aag 975 
Gly Arg Val Gin Asp Phe Leu Gly He Lys Arg Phe He Thr Asp Lys 
290 295 300 

cac ttc tat ttc aac aag ace aaa gga ttc cct tgc ttg aaa aaa aca 1023 
His Phe Tyr Phe Asn Lys Thr Lys Gly Phe Pro Cys Leu Lys Lys Thr 
305 310 * 315 

gaa teg age etc ctg cct cga tgc ttg ggc aaa tea aaa ggg aga act 1071 
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Glu Ser Ser Leu Leu Pro Arg Cys Leu Gly Lys Ser Lys Gly Arg Thr 
320 325 330 

cat gta cag att gat cct gaa gtg ata gac cag etc cga gaa ttt tat 1119 
His Val Gin He Asp Pro Glu Val He Asp Gin Leu Arg Glu Phe Tyr 
335 340 345 

aga ccg tat aat ate aaa ttt tat gaa acc gtt ggg cag gac ttc agg 1167 
Arg Pro Tyr Asn He Lys Phe Tyr Glu Thr Val Gly Gin Asp Phe Arg 
350 355 360 365 



tgg gaa taagcccacg aaaggaaagg gctctcaagg gctcttctgc tcatctcttc 
Trp Glu 


1223 


cgtgagattt 


gctcccagac 


cctcttatct 


ccctccaaca 


aaccctcrcrct 


W WWW V* w 


1283 


ttcccaactt 


cracrttcrcatc 


atcttcrciaac 


caaoaaocc c 


a cr f* t~ a a a fr r* r» 




1343 


agagtctctg 


ccactagttt 


tcatcagtct 


gttcaagcaa 


agttgatctg 


ctcctggcac 


1403 


gtccagtaaa 


ttccagaatc 


attctccttt 


ctgcccataa 


agggecttgg 


agaattgett 


1463 


taagaagagt 


gaatgttcca 


atgatgatag 


atattataag 


cgacgatggt 


tctgttgcta 


1523 


tgaacacagc 


agtcggtccc 


tgtcattgtc 


cacccaggag 


tggccttgtt 


aattccaagt 


1583 


ggcatgtatc 


ttccctctga 


gcttcatttc 


ttcaagatgc 


tctgggtggt 


gggatgggag 


1643 


accatcctca 


gccctcctca 


gaccttatca 


attcattgag 


agattgeaaa 


gctgaaagca 


1703 


cctccggcca 


ctcctgggag 


acagaccctt 


tggt^atgaa 


ataaaccagt 


gacttcagag 


1763 


cctatggtct 


caactgtgct 


tgaaaaacac 


tgtctctgaa 


aacaactttg 


tgattctccc 


1823 


tgctccctgt 


ggacaaaagc 


acataattct 


gctgttacgg 


gtactttget 


catacgagct 


1883 


ttcatgttca 


geatgeaatg 


gaatcatget 


tgtccatgtg 


aaataaatat 


ggctctctcg 


1943 


tgtcctta 












1951 


<210> 6 
<211> 367 
<212> PRT 
<213> Homo 


sapiens 













<400> 6 

Met Ala Tyr Arg Val Leu Gly Arg Ala Gly Pro Pro Gin Pro Arg Arg 
15 10 15 

Ala Arg Arg Leu Leu Phe Ala Phe Thr Leu Ser Leu Ser Cys Thr Tyr 
20 25 30 

Leu Cys Tyr Ser Phe Leu Cys Cys Cys Asp Asp Leu Gly Arg Ser Arg 
35 40 45 

Leu Leu Gly Ala Pro Arg Cys Leu Arg Gly Pro Ser Ala Gly Gly Gin 
50 55 60 

Lys Leu Leu Gin Lys Ser Arg Pro Cys Asp Pro Ser Gly Pro Thr Pro 
65 70 75 80 

Ser Glu Pro Ser Ala Pro Ser Ala Pro Ala Ala Ala Val Pro Ala Pro 
85 90 95 

Arg Leu Ser Gly Ser Asn His Ser Gly Ser Pro Lys Leu Gly Thr Lys 
100 105 no 
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Arg Leu Pro Gin Ala Leu lie Val Gly Val Lys Lys Gly Gly Thr Arg 
115 . 120 * 125 

Ala Val Leu Glu Phe lie Arg Val His Pro Asp Val Arg Ala Leu Gly 
130 135 140 

Thr Glu Pro His Phe Phe Asp Arg Asn Tyr Gly Arg Gly Leu Asp Trp 
145 150 155 ' 160 

Tyr Arg Ser Leu Met Pro Arg Thr Leu Glu Ser Gin He Thr Leu Glu 
165 170 175 

Lys Thr Pro Ser Tyr Phe Val Thr Gin Glu Ala Pro Arg Arg He Phe 
180 185 190 

Asn Met Ser Arg Asp Thr Lys Leu He Val Val Val Arg Asn Pro Val 
195 200 205 

Thr Arg Ala He Ser Asp Tyr Thr Gin Thr Leu Ser Lys Lys Pro Asp 
210 215 220 

He Pro Thr Phe Glu Gly Leu Ser Phe Arg Asn Arg Thr Leu Gly Leu 
225 230 235 240 

Val Asp Val Ser Trp Asn Ala He Arg He Gly Met Tyr Val Leu His 
245 250 " 255 

Leu Glu Ser Trp Leu Gin Tyr Phe Pro Leu Ala Gin He His Phe Val 
260 265 270 

Ser Gly Glu Arg Leu He Thr Asp Pro Ala Gly Glu Met Gly Arg Val 
275 280 285 

Gin Asp Phe Leu Gly ' He Lys Arg Phe He Thr Asp Lys His Phe Tyr 
290 295 300 

Phe Asn Lys Thr Lys Gly Phe Pro Cys Leu Lys Lys Thr Glu Ser Ser 
305 310 315 320 

Leu Leu Pro Arg Cys Leu Gly Lys Ser Lys Gly Arg Thr His Val Gin 
325 330 335 

He Asp Pro Glu Val He Asp Gin Leu Arg Glu Phe Tyr Arg Pro Tyr 
340 345 350 

Asn He Lys Phe Tyr Glu Thr Val Gly Gin Asp Phe Arg Trp Glu 
355 360 365 

<210> 7 

<211> 2314 

<212> DNA 

<213> Homo sapiens 

<220> 
<221> CDS 

<222> (799) . . (2016) 
<223> human 3-OST-3A 

<400> 7 

cagcggcggc ccaggaggca gccggtgagc gcctgcgagc agagtggcgg gggccgctga 60 
caggtcccgc gcagcccagc ccagcccagc cacgcggctc acaggtgggg tccaagagca 120 
gtttggagca acccggcgct acggagaggg gtggacggct ctgcacgggc ctcctgtctc 180 
ccgctcgggc agagggactc ggggggacct cgctccttgg ccgagagaac ctgaactcgg 240 
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gcggagagaa 


cgcgcccagg 


egggcaaggg 


gaccagagaa 


ageegggget 


ggaagtcact 


300 


gtcgctcgcc 


actgtctgga 


gegcaeggag 


cgcagaggcc 


cggcagccgc 


gcgtgccctc 


360 


c cgggga c eg 


agecagtgat 


geaggatege 


tgageggaga 


tccgcgccga 


gaagtctctc 


420 


ggggccgggg 


ctgagacgea 


cgccuucgac 


accgctgcca 


agaccccgat 


tccggcgact 


480 


ettgegggga 


accgaggggc 


caaggc eg c c 


ccaagctcag 


gacttgggcg 


agtctaagac 


540 


gauygULLCL 


caagcaegga 


cccgcgttcc 


ccttcccgcc 


ccctcgactg 


gaggcaggga 


600 


tcctgcgcgg 


ggcccccggg 


attcegttte 


cccgcggagc 


cccggccgct 


gcctcccggg 660 


acagttcgea 


cggccacagg 


ggcgcacggc 


gatgtggcct 


ccgtccagcg 


cgctggcccg 


720 


ceggggggat 


gctctggcac 


ctgtcggggt 


ccaggcctag 


catggccggc 


gcgttgcccg 


780 


acgtcgcctc 


eggctagg atg gec cct ccg ggc ccg 
Met Ala Pro Pro Gly Pro 
1 5 


gec agt gec etc tec 
Ala Ser Ala Leu Ser 
10 


831 



ace teg gec gag ccg ctg tec cgc age ate ttc egg aag ttc ttg ctg 879 
Thr Ser Ala Glu Pro Leu Ser Arg Ser lie Phe Arg Lys Phe Leu Leu 
15 20 25 

atg etc tgc tec ctg etc acg tec ctt tac gtc ttc tac tgc ctg gee 927 
Met Leu Cys Ser Leu Leu Thr Ser Leu Tyr Val Phe Tyr Cys Leu Ala 
30 35 40 

gag cgc tgc cag acc ctg tec ggc ccc gtc gtg ggg ctg tec ggc ggc 975 
Glu Arg Cys Gin Thr Leu Ser Gly Pro Val Val Gly Leu Ser Gly Gly 
45 50 55 

ggc gag gag gcg ggg gec cct ggt ggc ggc gtc ctg gee gga ggc ccg 1023 
Gly Glu Glu Ala Gly Ala Pro Gly Gly Gly Val Leu Ala Gly Gly Pro 
60 65 70 75 

agg gag ctg gcg gtg tgg ccg gcg gcg gca cag aga aag cgc etc ctg 1071 
Arg Glu Leu Ala Val Trp Pro Ala Ala Ala Gin Arg Lys Arg Leu Leu 
80 85 90 

caa ctg ccg cag tgg egg agg cgc egg ccg ccc gcg ccc cgc gac gac 1119 
Gin Leu Pro Gin Trp Arg Arg Arg Arg Pro Pro Ala Pro Arg Asp Asp 
95 100 105 

ggc gag gag gcg gee tgg gaa gaa gag tec cct ggc ctg tea ggg ggt 1167 
Gly Glu Glu Ala Ala Trp Glu Glu Glu Ser Pro Gly Leu Ser Gly Gly 
110 115 120 

ccg ggc ggc tec ggg gee gga age acc gtg gec gag gec ccg ccg ggg 1215 
Pro Gly Gly Ser Gly Ala Gly Ser Thr Val Ala Glu Ala Pro Pro Gly 
125 130 135 

acc ctg gcg ctg etc ctg gac gaa ggc age aag cag ctg ccg cag gec 1263 
Thr Leu Ala Leu Leu Leu Asp Glu Gly Ser Lys Gin Leu Pro Gin Ala 
140 145 150 155 

ate ate ate gga gtg aag aag ggc ggc acg egg gcg ctg ctg gag ttc 1311 
He He He Gly Val Lys Lys Gly Gly Thr Arg Ala Leu Leu Glu Phe 
160 165 170 

ctg cgc gtg cac ccc gac gtg cgc gee gtg ggc gee gag ccc cac ttc 1359 
Leu Arg Val His Pro Asp Val Arg Ala Val Gly Ala Glu Pro His Phe 
175 180 185 

ttc gac cgc age tac gac aag ggc etc gec tgg tac egg gac ctg atg 1407 
Phe Asp Arg Ser Tyr Asp Lys Gly Leu Ala Trp Tyr Arg Asp Leu Met 
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190 195 200 

ccc aga acc ctg gac ggg cag ate acc atg gag aag acg ccc agt tac 1455 
Pro Arg Thr Leu Asp Gly Gin lie Thr Met Glu Lys Thr Pro Ser Tyr 
205 210 215 

ttc gtc acg egg gag gee ccc gcg cgc ate teg gee atg tec aag gac 1503 
Phe Val Thr Arg Glu Ala Pro Ala Arg lie Ser Ala Met Ser Lys Asp 
220 225 230 235 

acc aag etc ate gtg gtg gtg egg gac ccg gtg acc agg gec ate teg 1551 
Thr Lys Leu He Val Val Val Arg Asp Pro Val Thr Arg Ala He Ser 
240 245 250 

gac tac acg cag acg ctg tec aag egg ccc gac ate ccc acc ttc gag 1599 
Asp Tyr Thr Gin Thr Leu Ser Lys Arg Pro Asp He Pro Thr Phe Glu 
255 260 265 

age ttg acg ttc aaa aac agg aca gcg ggc etc ate gac acg teg tgg 1647 
Ser Leu Thr Phe Lys Asn Arg Thr Ala Gly Leu He Asp Thr Ser Trp 
270 275 280 

age gee ate cag ate ggc ate tac gee aag cac ctg gag cac tgg ctg 1695 
Ser Ala He Gin He Gly He Tyr Ala Lys His Leu Glu His Trp Leu 
285 290 295 

cgc cac ttc ccc ate cgc cag atg etc ttc gtg age ggc gag egg etc 1743 
Arg His Phe Pro He Arg Gin Met Leu Phe Val Ser Gly Glu Arg Leu 
300 305 310 315 

ate age gac ccg gee ggg gag ctg ggc cgc gtg caa gac ttc ctg ggc 1791 
He Ser Asp Pro Ala Gly Glu Leu Gly Arg Val Gin Asp Phe Leu Gly 
320 325 330 

etc aag agg ate ate acg gac aag cac ttc tac ttc aac aag acc aag 1839 
Leu Lys Arg He He Thr Asp Lys His Phe Tyr Phe Asn Lys Thr Lys 
335 340 345 

ggc ttc ccc tgc ctg aag aag gcg gag ggc age age egg ccc cat tgc 1887 
Gly Phe Pro Cys Leu Lys Lys Ala Glu Gly Ser Ser Arg Pro His Cys 
350 355 360 

ctg ggc aag acc aag ggc agg acc cat cct gag ate gac cgc gag gtg 1935 
Leu Gly Lys Thr Lys Gly Arg Thr His Pro Glu He Asp Arg Glu Val 
365 370 375 

gtg cgc agg ctg cgc gag ttc tac egg cct ttc aac etc aag ttc tac 1983 
Val Arg Arg Leu Arg Glu Phe Tyr Arg Pro Phe Asn Leu Lys Phe Tyr 
380 385 390 395 

cag atg acc ggg cac gac ttt ggc tgg gat gga taaccatata atttaaaaag 2036 
Gin Met Thr Gly His Asp Phe Gly Trp Asp Gly 
400 ' 405 

aaaaaaaaaa tcaaaatata atatattttt ttaccaatcg gtagagaaga gacagtttaa 2096 

tatttgtgct gaaaatatgt ttcagtattt ttttcaatga atgttaagag attgttctca 2156 

ctcccgcccc atcttaatgt ataaccaaca ccaaacacgt ggatcaacag aaaaggaaaa 2216 

tttcactcgt ctaaacactt tcaattttca gtttttattt tatgttctat atacccagtc 2276 

ataaagtata agcatcagtt gtcattaaaa gttttcag 2314 



<210> 8 
<211> 406 
<212> PRT 
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<213> Homo sapiens 
<400> 6 

Met Ala Pro Pro Gly Pro Ala Ser Ala Leu Ser Thr Ser Ala Glu Pro 
15 10 15 

Leu Ser Arg Ser lie Phe Arg Lys Phe Leu Leu Met Leu Cys Ser Leu 
20 25 30 

Leu Thr Ser Leu Tyr Val Phe Tyr Cys Leu Ala Glu Arg Cys Gin Thr 
35 40 45 

Leu Ser Gly Pro Val Val Gly Leu Ser Gly Gly Gly Glu Glu Ala Gly 
50 55 60 

Ala Pro Gly Gly Gly Val Leu Ala Gly Gly Pro Arg Glu Leu Ala Val 
65 70 75 80 

Trp Pro Ala Ala Ala Gin Arg Lys Arg Leu Leu Gin Leu Pro Gin Trp 
85 ~ 90 95 

Arg Arg Arg Arg Pro Pro Ala Pro Arg Asp Asp Gly Glu Glu Ala Ala 
100 105 HO 

Trp Glu Glu Glu Ser Pro Gly Leu Ser Gly Gly Pro Gly Gly Ser Gly 
115 120 " 125 

Ala Gly Ser Thr Val Ala Glu Ala Pro Pro Gly Thr Leu Ala Leu Leu 
130 135 140 

Leu Asp Glu Gly Ser Lys Gin Leu Pro Gin Ala lie lie lie Gly Val 
145 i50 155 160 

Lys Lys Gly Gly Thr Arg Ala Leu Leu Glu Phe Leu Arg Val His Pro 
165 170 * 175 

Asp Val Arg Ala Val Gly Ala Glu Pro His Phe Phe Asp Arg Ser Tyr 
180 185 190 

Asp Lys Gly Leu Ala Trp Tyr Arg Asp Leu Met Pro Arg Thr Leu Asp 
195 200 205 

Gly Gin lie Thr Met Glu Lys Thr Pro Ser Tyr Phe Val Thr Arg Glu 
210 215 220 

Ala Pro Ala Arg lie Ser Ala Met Ser Lys Asp Thr Lys Leu He Val 
225 230 235 240 

Val Val Arg Asp Pro Val Thr Arg Ala He Ser Asp Tyr Thr Gin Thr 
245 250 255 

Leu Ser Lys Arg Pro Asp He Pro Thr Phe Glu Ser Leu Thr Phe Lys 
260 265 270 

Asn Arg Thr Ala Gly Leu He Asp Thr Ser Trp Ser Ala He Gin He 
275 280 285 

Gly He Tyr Ala Lys His Leu Glu His Trp Leu Arg His Phe Pro He 
290 295 300 

Arg Gin Met Leu Phe Val Ser Gly Glu Arg Leu He Ser Asp Pro Ala 
305 310 315 320 

Gly Glu Leu Gly Arg Val Gin Asp Phe Leu Gly Leu Lys Arg He He 
325 330 335 

Thr Asp Lys His Phe Tyr Phe Asn Lys Thr Lys Gly Phe Pro Cys Leu 
340 345 350 
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Lys Lys Ala Glu Gly Ser Ser Arg Pro His Cys Leu Gly Lys Thr Lys 
355 360 365 

Gly Arg Thr His Pro Glu He Asp Arg Glu Val Val Arg Arg Leu Arg 
370 375 380 

Glu Phe Tyr Arg Pro Phe Asn Leu Lys Phe Tyr Gin Met Thr Gly His 
365 390 395 400 

Asp Phe Gly Trp Asp Gly 
405 

<210> 9 

<211> 2032 

<212> DNA 

<213> Homo sapiens 

<220> 
<221> CDS 

<222> (331) . . (1500) 
<223> human 3-OST-3B 

<400> 9 

gtggccaggg cgcgagagtg caacgtcctc ctggccccga gcgcgtcgtc gcgccccggg 60 

agcagaccct cgcccagcag ttaccgccgt cccgactttc cgttccagtt gcagctcctg 120 

ccgggcaaca tgtcaagagc cgccgccgct acagctgccg ccgccacctg gggaagagca 180 

gcagcagcag cggcggccgc gggcacacgg gggcaataaa ccgagccacc cgggcgtcca 240 

gcgtgccggg gaaccctctc tgcgctcact gcccggcggg acccacgcca tgtgctgagc 300 

catgtccctg gccgcgcccg cgggcagcgc atg ggg cag cgc ctg agt ggc ggc 354 

Met Gly Gin Arg Leu Ser Gly Gly 
1 5 

aga tct tgc etc gat gtc ccc ggc egg etc eta ccg cag ccg ccg ccg 402 
Arg Ser Cys Leu Asp Val Pro Gly Arg Leu Leu Pro Gin Pro Pro Pro 
10 15 20 

ccc ccg ccg ccg gtg agg agg aag etc gcg ctg etc ttc gee atg etc 450 
Pro Pro Pro Pro Val Arg Arg Lys Leu Ala Leu Leu Phe Ala Met Leu 
25 30 35 40 

tgc gtc tgg etc tat atg ttc ctg tac teg tgc gee ggc tec tgc gee 498 
Cys Val Trp Leu Tyr Met Phe Leu Tyr Ser Cys Ala Gly Ser Cys Ala 
45 50 55 

gee gcg ccg ggg ctg ctg etc ctg ggc tct ggg tec cgc gee gca cac 546 
Ala Ala Pro Gly Leu Leu Leu Leu Gly Ser Gly Ser Arg Ala Ala His 
60 65 " 70 

gac ccg cca gee ctg gee aca get ccg gac ggg acg ccc ccc agg ctg 594 
Asp Pro Pro Ala Leu Ala Thr Ala Pro Asp Gly Thr Pro Pro Arg Leu 
75 80 85 

ccg ttc egg gcg ccg cca gee ace cca ctg get tea ggc aag gag atg 642 
Pro Phe Arg Ala Pro Pro Ala Thr Pro Leu Ala Ser Gly Lys Glu Met 
90 95 100 

gee gag ggc get gcg age ccg gag gag cag agt ccc gag gtg ccg gac 690 
Ala Glu Gly Ala Ala Ser Pro Glu Glu Gin Ser Pro Glu Val Pro Asp 
105 no 115 120 

tec cca age ccc ate tec age ttt ttc agt ggg tct ggg age aag cag 738 
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Ser Pro Ser Pro lie Ser Ser Phe Phe Ser Gly Ser Gly Ser Lys Gin 
125 130 ~ 135 

ctg ccg cag gcc ate ate ate ggc gtg aag aag ggc ggc acg egg gcg 786 
Leu Pro Gin Ala He He He Gly Val Lys Lys Gly Gly Thr Arg Ala 
140 145 150 

ctg ctg gag ttt ctg cgc gtg cac ccc gac gtg cgc gcc gtg ggc gcc 834 
Leu Leu Glu Phe Leu Arg Val His Pro Asp Val Arg Ala Val Gly Ala 
155 160 165 

gag ccc cat ttc ttc gat cgc age tac gac aag ggc etc get tgg tac 882 
Glu Pro His Phe Phe Asp Arg Ser Tyr Asp Lys Gly Leu Ala Trp Tyr 
170 175 180 

egg gac ctg atg ccc aga acc ctg gac ggg cag ate acc atg gag aag 930 
Arg Asp Leu Met Pro Arg Thr Leu Asp Gly Gin He Thr Met Glu Lys 
185 190 195 200 

acg ccc agt tac ttc gtc acg egg gag gcc ccc gcg cgc ate teg gcc 978 
Thr Pro Ser Tyr Phe Val Thr Arg Glu Ala Pro Ala Arg He Ser Ala 
205 210 215 

atg tee aag gac acc aag etc ate gtg gtg gtg egg gac ccg gtg acc 1026 
Met Ser Lys Asp Thr Lys Leu lie Val Val Val Arg Asp Pro Val Thr 
220 225 230 

agg gcc ate teg gac tac acg cag acg ctg tec aag egg ccc gac ate 1074 
Arg Ala He Ser Asp Tyr Thr Gin Thr Leu Ser Lys Arg Pro Asp He 
235 240 245 

ccc acc ttc gag age ttg acg ttc aaa aac agg aca gcg ggc etc ate 1122 
Pro Thr Phe Glu Ser Leu Thr Phe Lys Asn Arg Thr Ala Gly Leu He 
250 255 260 

gac acg teg tgg age gcc ate cag ate ggc ate tac gcc aag cac ctg 1170 
Asp Thr Ser Trp Ser Ala He Gin He Gly He Tyr Ala Lys His Leu 
265 270 275 280 

gag cac tgg ctg cgc cac ttc ccc ate cgc cag atg etc ttc gtg age 1218 
Glu His Trp Leu Arg His Phe Pro He Arg Gin Met Leu Phe Val Ser 
285 290 295 

ggc gag egg etc ate age gac ccg gcc ggg gag ctg ggc cgc gtg caa 1266 
Gly Glu Arg Leu He Ser Asp Pro Ala Gly Glu Leu Gly Arg Val Gin 
300 305 310 

gac ttc ctg ggc etc aag agg ate ate acg gac aag cac ttc tac ttc 1314 
Asp Phe Leu Gly Leu Lys Arg lie lie Thr Asp Lys His Phe Tyr Phe 
315 320 325 

aac aag acc aag ggc ttc ccc tgc ctg aag aag gcg gag ggc age age 1362 
Asn Lys Thr Lys Gly Phe Pro Cys Leu Lys Lys Ala Glu Gly Ser Ser 
330 335 340 

egg ccc cat tgc ctg ggc aag acc aag ggc agg acc cat cct gag ate 1410 
Arg Pro His Cys Leu Gly Lys Thr Lys Gly Arg Thr His Pro Glu He 
345 350 355 360 

gac cgc gag gtg gtg cgc agg ctg cgc gag ttc tac egg cct ttc aac 1458 
Asp Arg Glu Val Val Arg Arg Leu Arg Glu Phe Tyr Arg Pro Phe Asn 
365 370 375 

etc aag ttc tac cag atg acc ggg cac gac ttt ggc tgg gat 1500 
Leu Lys Phe Tyr Gin Met Thr Gly His Asp Phe Gly Trp Asp 
380 385 390 

tgagcagacc egggctatgt accttaccca cgtggcttat ctattgacag agattatatg 1560 
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tatgtaaaat gtacagaaat ctattttata ataatttatt tttaattcat aagcaattaa 1620 

ttcactaagc tgcctagcca cactctttag agagttagct tcataatctg ttaacattcc 1680 

aaagtgttta actctagtat ttcgttttct tcttcacaat tgatggtgct tctatttttt 1740 

cttctcccct acctgttata tttaaaacaa agaaaagcac aacttgagat ttttgttgtt 1800 

acgggtattc agccttcagt caccgtctga gttctccagt tgctgcctcc ttgtcttgtc 1860 

ttgggtctcc cattccagct tccctgtctc ttcctgcctg tgtacctcgt aggaacgctg 1920 

agctgcctca acagggctgt attctgaagg gcaggcctca tgcagcagcc tccttgcaga 1980 

tgtggtgtcc cgtccaatga tgtagcctga aagccacagc cctagggttc tg 2032 

<210> 10 

<211> 390 

<212> PRT 

<213> Homo sapiens 

<400> 10 

Met Gly Gin Arg Leu Ser Gly Gly Arg Ser Cys Leu Asp Val Pro Gly 
1 5 10 15 

Arg Leu Leu Pro Gin Pro Pro Pro Pro Pro Pro Pro Val Arg Arg Lys 
20 25 30 

Leu Ala Leu Leu Phe Ala Met Leu Cys Val Trp Leu Tyr Met Phe Leu 
35 40 45 

Tyr Ser Cys Ala Gly Ser Cys Ala Ala Ala Pro Gly Leu Leu Leu Leu 
50 55 60 

Gly Ser Gly Ser Arg Ala Ala His Asp Pro Pro Ala Leu Ala Thr Ala 
65 70 75 80 

Pro Asp Gly Thr Pro Pro Arg Leu Pro Phe Arg Ala Pro Pro Ala Thr 
85 90 ~ 95 

Pro Leu Ala Ser Gly Lys Glu Met Ala Glu Gly Ala Ala Ser Pro Glu 
100 105 110 

Glu Gin Ser Pro Glu Val Pro Asp Ser Pro Ser Pro lie Ser Ser Phe 
115 120 125 

Phe Ser Gly Ser Gly Ser Lys Gin Leu Pro Gin Ala He He He Gly 
130 135 140 

Val Lys Lys Gly Gly Thr Arg Ala Leu Leu Glu Phe Leu Arg Val His 
145 150 155 160 

Pro Asp Val Arg Ala Val Gly Ala Glu Pro His Phe Phe Asp Arg Ser 
165 170 175 

Tyr Asp Lys Gly Leu Ala Trp Tyr Arg Asp Leu Met Pro Arg Thr Leu 
180 185 190 

Asp Gly Gin He Thr Met Glu Lys Thr Pro Ser Tyr Phe Val Thr Arg 
195 200 205 

Glu Ala Pro Ala Arg He Ser Ala Met Ser Lys Asp Thr Lys Leu He 
210 215 220 

Val Val Val Arg Asp Pro Val Thr Arg Ala He Ser Asp Tyr Thr Gin 
225 230 235 240 
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Thr Leu Ser Lys Arg Pro Asp He Pro Thr Phe Glu Ser Leu Thr Phe 
245 . 250 255 

Lys Asn Arg Thr Ala Gly Leu He Asp Thr Ser Trp Ser Ala He Gin 
260 265 " 270 

He Gly He Tyr Ala Lys His Leu Glu His Trp Leu Arg His Phe Pro 
275 280 * 285 

He Arg Gin Met Leu Phe Val Ser Gly Glu Arg Leu He Ser Asp Pro 
290 295 300 

Ala Gly Glu Leu Gly Arg Val Gin Asp Phe Leu Gly Leu Lys Arg He 
305 310 315 320 

He Thr Asp Lys His Phe Tyr Phe Asn Lys Thr Lys Gly Phe Pro Cys 
325 330 335 

Leu Lys Lys Ala Glu Gly Ser Ser Arg Pro His Cys Leu Gly Lys Thr 
340 345 350 

Lys Gly Arg Thr His Pro Glu He Asp Arg Glu Val Val Arg Arg Leu 
355 360 365 

Arg Glu Phe Tyr Arg Pro Phe Asn Leu Lys Phe Tyr Gin Met Thr Gly 
370 375 380 

His Asp Phe Gly Trp Asp 
385 390 

<210> 11 

<211> 3658 

<212> DNA 

<213> Homo sapiens 

<220> 
<221> CDS 

<222> (847) . . (2214) 
<220> 

<223> Predicted human 3-OST-4 hnRNA 
<400> 11 



gaggatatcc 


cgggcgagag 


aagggagggt 


cggggatggg 


ctgagttgga 


gtcccagagg 


60 


aaaagcggaa 


gcgagagctt 


cgtcacccgc 


tgtcttccag 


ctcccggtgc 


gcggcaccgg 


120 


aggcaggcgt 


tgggctttac 


ctctctaaaa 


gtactggggc 


aaaggaatgg 


agaacacggc 


180 


gtcccgagct 


cccaagggag 


gggagtaaac 


gaggtggggt 


ggggaacacc 


ccaagtgcgt 


240 


gcgtgctggg 


gggctggggg 


gcacgatctc 


cgttctcccg 


ggtgccccag 


ccctagcgca 


300 


cgcctccgct 


cccccgcccc 


cttcgcaggc 


gcgcgcgagg 


cgcacccccc 


ttccctcggc 


360 


ggcgccgggc 


gcgcgcccgg 


ccccctcctc 


ctcccctccg 


cgcctctcct 


ctctcccggc 


420 


agaaagttag 


cagcggggaa 


ggaactctgg 


gctgcaacag 


cgcgcggcgg 


cggcggcaga 


480 


ggctgaagca 


gaagccgcgg 


cggagccggg 


gaagcggggg 


cgctgcagac 


ggagcaggtg 


540 


ccgccggcgg 


gtccgcgcgc 


ccccctcggt 


ccccttgcct 


gaggctgagg 


ggggggcggt 


600 


ggtggggggg 


ccactcggac 


tcggcgggca 


gcgtggggcg 


gggggccatg 


cggccgggct 


660 


cccccctggc 


gcagcgggac 


agcggccagg 


gccgggggcg 


cagcggcgtc 


gcttcatgca 


720 
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gccggggcgg ctgggcagcg gcggcggcgg cggcggcggc ggcggcgggg gcggcggctg 780 

aaaccatgtc cgggcagcgc cgggggctgc cgccgccgcc gccgccgccg cgagccggga 840 

gccgcg atg gcc egg tgg ccc gca cct cct ccg cct ccg cct ccg cct 888 
Met Ala Arg Trp Pro Ala Pro Pro Pro Pro Pro Pro Pro Pro 
1 5 10 

cca cct ctg gcc gcg ccg ccg ccg ccc ggc gcc tct get aag ggg ccg 936 
Pro Pro Leu Ala Ala Pro Pro Pro Pro Gly Ala Ser Ala Lys Gly Pro 
15 20 25 30 

ccg gcg cgc aag ctg ctt ttt atg tgc acc ttg tec ctg tct gtc acc 984 
Pro Ala Arg Lys Leu Leu Phe Met Cys Thr Leu Ser Leu Ser Val Thr 
35 40 45 

tac ctg tgc tac age etc ctg ggc ggc teg ggc tec ctg caa ttc cct 1032 
Tyr Leu Cys Tyr Ser Leu Leu Gly Gly Ser Gly Ser Leu Gin Phe Pro 
50 55 60 

ct 9 9cg ctg cag gag teg ccg ggc gcc gcc gcc gag ccc ccg ccg age 1080 
Leu Ala Leu Gin Glu Ser Pro Gly Ala Ala Ala Glu Pro Pro Pro Ser 
.65 70 75 

ccg ccg cca ccc tct ctg ctg cct acc ccc gtg cgc etc ggc gcc ccc 1128 
Pro Pro Pro Pro Ser Leu Leu Pro Thr Pro Val Arg Leu Gly Ala Pro 
80 85 90 

teg cag ccg ccc gcg ccg ccg ccg ctg gac aac gcg age cac ggg gag 1176 
Ser Gin Pro Pro Ala Pro Pro Pro Leu Asp Asn Ala Ser His Gly Glu 
95 100 105 110 

ccg ccc gag ccc cca gag cag cca gcc gcc ccc ggg acc gac ggc tgg 1224 
Pro Pro Glu Pro Pro Glu Gin Pro Ala Ala Pro Gly Thr Asp Gly Trp 
115 120 125 

ggg ctg ccg age ggc ggc gga ggc gcc egg gac gcc tgg etc egg acc 1272 
Gly Leu Pro Ser Gly Gly Gly Gly Ala Arg Asp Ala Trp Leu Arg Thr 
130 135 140 

ccg ctg gcc ccc age gag atg ate acg get cag age gcg ctg ccg gag 1320 
Pro Leu Ala Pro Ser Glu Met He Thr Ala Gin Ser Ala Leu Pro Glu 
145 150 155 

agg gaa gcg cag gag tec age acc acc gac gag gat etc gca ggc egg 1368 
Arg Glu Ala Gin Glu Ser Ser Thr Thr Asp Glu Asp Leu Ala Gly Arg 
160 165 170 

aga gcg gcc aac ggg age age gag agg ggc ggc gcc gtc age acc ccc 1416 
Arg Ala Ala Asn Gly Ser Ser Glu Arg Gly Gly Ala Val Ser Thr Pro 
175 180 185 190 

gac tat ggg gag aag aag ctg cca cag gcg etc ate ate ggg gtc aag 1464 
Asp Tyr Gly Glu Lys Lys Leu Pro Gin Ala Leu He He Gly Val Lys 
195 200 205 

aaa gga ggg acc cgc gcg ctg ctg gag gcg ate cgc gtg cac ccg gac 1512 
Lys Gly Gly Thr Arg Ala Leu Leu Glu Ala He Arg Val His Pro Asp 
210 215 220 

gtg egg gcg gtg ggc gta gag ccg cac ttc ttc gac agg aac tac gaa 1560 
Val Arg Ala Val Gly Val Glu Pro His Phe Phe Asp Arg Asn Tyr Glu 
225 230 235 

aag ggg ttg gag tgg tac aga aat gtg atg ccc aag act ttg gat ggg 1608 
Lys Gly Leu Glu Trp Tyr Arg Asn Val Met Pro Lys Thr Leu Asp Gly 
240 245 250 
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caa ata acc atg gag aag act cca agt tac ttt gtg aca aat gag get 1656 
Gin lie Thr Met Glu Lys Thr Pro Ser Tyr Phe Val Thr Asn Glu Ala 
255 260 265 270 

ccc aag cgc att cac tec atg gec aag gac ate aaa ctg att gtg gtg 1704 
Pro Lys Arg lie His Ser Met Ala Lys Asp lie Lys Leu lie Val Val 
275 280 285 



gtg aga aac ccc gtg acc agg gec ate tct gac tac acg cag aca ctg 1752 
Val Arg Asn Pro Val Thr Arg Ala lie Ser Asp Tyr Thr Gin Thr Leu 
290 295 300 

tea aag aaa ccc gag ate ccc acc ttt gag gtg ctg gee ttc aaa aac 1800 
Ser Lys Lys Pro Glu lie Pro Thr Phe Glu Val Leu Ala Phe Lys Asn 
305 310 315 

egg acc etc ggg ctg ate gat get tec tgg agt gee att cga ata ggg 1848 
Arg Thr Leu Gly Leu He Asp Ala Ser Trp Ser Ala He Arg He Gly 

150 loc 



ate tat gcg ctg cat ctg gaa aac tgg etc cag tat ttc ccc etc tec 1896 
He Tyr Ala Leu His Leu Glu Asn Trp Leu Gin Tyr Phe Pro Leu Ser 
335 340 345 350 

cag ate etc ttt gtc agt ggt gag cga etc att gtg gac ccc gee ggg 1944 
Gin He Leu Phe Val Ser Gly Glu Arg Leu He Val Asp Pro Ala Gly 
355 360 365 



gaa atg gee aaa gta cag gat ttt eta ggc etc aaa cgt gtt gtg act 1992 
Glu Met Ala Lys Val Gin Asp Phe Leu Gly Leu Lys Arg Val Val Thr 
370 375 380 

aag aag cat ttc tat ttc aac aaa acc aag ggg ttc cct tgc eta aag 2040 
Lys Lys His Phe Tyr Phe Asn Lys Thr Lys Gly Phe Pro Cys Leu Lys 
385 390 395 

aag cca gaa gac age agt gec ccg agg tgc tta ggc aag age aaa ggt 2088 
Lys Pro Glu Asp Ser Ser Ala Pro Arg Cys Leu Gly Lys Ser Lys Gly 

400 Arte j 1 a 



egg act cat cct cgc att gac cca gat gtc ate cac aga ctg agg aaa 2136 
Arg Thr His Pro Arg He Asp Pro Asp Val He His Arg Leu Arg Lys 
415 420 425 430 

ttc tac aaa ccc ttc aac ttg atg ttt tac caa atg act ggt caa gat 2184 
Phe Tyr Lys Pro Phe Asn Leu Met Phe Tyr Gin Met Thr Gly Gin Asp 
435 440 445 

ttt cag tgg gaa cag gaa gag ggt gat aaa tgaggctaga gaggcagagg 2234 
Phe Gin Trp Glu Gin Glu Glu Gly Asp Lys 
450 455 

aaggctagtc aataagctaa ggaggctcct tgcctgagtc cttgaatacc ccagcttctg 2294 

cagcttcact tgctggagtg ccaagtagat ctcctcctcc ttcatgeage caggattgee 2354 

tccagtgctg ttagcttagg caaacaggtg gatcccatgg catccccatg gaggaaccag 2414 

gcccatctgg gcagcagcat ctggttgacc agatggecac cagaacccac tgttcattct 2474 

tatcttctgc tagttaatat agectgaaga cagaggataa atagttgtca atgtcagaga 2534 

cagtgetatt aatgtatatg tgagcgacaa aaaaggtctg ctttataggg gttctcactc 2594 

tagcttgggg ageccagggt tctagccctg tatctgtcat gggcacctgc tgtctaaacc 2654 

tctgcttggg cttctcccca gaatgeaett tgtggctgag tgctccagga ctcctaggga 2714 
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gcaagctcct ccctctaagg tgtttctagt cttctcttta aaggtctcat cccacaaccc 2774 
ctgacttcct ccctccccac atcatgaagg cagaggcatg cacattcctc actgaaaaag 2834 
aaaacacaca cccacccaca cacacacaca cagaagaaaa tgaaagctga cacacctcga 2894 
agccttcttt ccaagagccc tctaaatggg gttgggtctc actcttcatg agtatcctgg 2954 
gttgtgcaga agcttagcat atgcccttgt gttcggatca ggcccacagg gctgctcaaa 3014 
gagtagagta attgtaaccg aggtcagagc tctggggttg gcagagatga gtggccatat 3074 
ctgggggtaa aagaagaaat cctgtcctct tggtgggagg ttaccttacc tgaagaccat 3134 
ctctcccaag cactgtagtt ctgagcatgt ttttggggtg gactctgtcc cctagggtcc 3194 
ctagaagggc aaagaccaga gagttgacaa gtctgttatt aggaataatc cttagccatg 3254 
taatggagaa aggagcagtc agcattcttc caatttgccc caccaccacc tcctcgggct 3314 
tcattttctc tatttagaga tggcagagag tgaggtagtg gcgagaaagc tgactccatt 3374 
catcagatcc agtttatgag ggttgggggt gagcaagggc tgtctgcaga aacccccatc 3434 
aagagctgct gaatgaagtg tcccttccca tcagtttgat tcaattaaaa tgcatcattt 3494 
gacataaagc acttgttcac agatctccaa aaccaggaat tgttctagta aaactggaaa 3554 
tttgtatgag tggggggagt taaatctgtt cagctgttat taaactgtca tttctcccgc 3614 
taaatgaaaa ccgtgttgtt ataaagctta atgcaacctg atta 3658 

<210> 12 

<211> 456 

<212> PRT 

<213> Homo sapiens 

<400> 12 

Met Ala Arg Trp Pro Ala Pro Pro Pro Pro Pro Pro Pro Pro Pro Pro 
15 10 15 

Leu Ala Ala Pro Pro Pro Pro Gly Ala Ser Ala Lys Gly Pro Pro Ala 
20 25 " 30 

Arg Lys Leu Leu Phe Met Cys Thr Leu Ser Leu Ser Val Thr Tyr Leu 
35 40 45 

Cya Tyr Ser Leu Leu Gly Gly Ser Gly Ser Leu Gin Phe Pro Leu Ala 
50 55 60 

Leu Gin Glu Ser Pro Gly Ala Ala Ala Glu Pro Pro Pro Ser Pro Pro 
65 70 . 75 80 

Pro Pro Ser Leu Leu Pro Thr Pro Val Arg Leu Gly Ala Pro Ser Gin 
85 90 95 

Pro Pro Ala Pro Pro Pro Leu Asp Asn Ala Ser His Gly Glu Pro Pro 
100 105 no 

Glu Pro Pro Glu Gin Pro Ala Ala Pro Gly Thr Asp Gly Trp Gly Leu 
115 120 * 125 

Pro Ser Gly Gly Gly Gly Ala Arg Asp Ala Trp Leu Arg Thr Pro Leu 
130 135 140 

Ala Pro Ser Glu Met lie Thr Ala Gin Ser Ala Leu Pro Glu Arg Glu 
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"5 150 155 160 

Ala Gin Glu Ser Ser Thr Thr Asp Glu Asp Leu Ala Gly Arg Arg Ala 
165 170 - 175 

Ala Asn Gly Ser Ser Glu Arg Gly Gly Ala Val Ser Thr Pro Asp Tyr 
180 185 190 

Gly Glu Lys Lys Leu Pro Gin Ala Leu He He Gly Val Lys Lys Gly 
195 200 205 

Gly Thr Arg Ala Leu Leu Glu Ala He Arg Val His Pro Asp Val Arg 
210 215 220 

Ala Val Gly Val Glu Pro His Phe Phe Asp Arg Asn Tyr Glu Lys Gly 
225 230 235 240 

Leu Glu Trp Tyr Arg Asn Val Met Pro Lys Thr Leu Asp Gly Gin He 
245 250 255 

Thr Met Glu Lys Thr Pro Ser Tyr Phe Val Thr Asn Glu Ala Pro Lys 
260 265 270 

Arg He His Ser Met Ala Lys Asp He Lys Leu He Val Val Val Arg 
275 280 ' 285 

Asn Pro Val Thr Arg Ala He Ser Asp Tyr Thr Gin Thr Leu Ser Lys 
290 295 300 

Lys Pro Glu He Pro Thr Phe Glu Val Leu Ala Phe Lys Asn Arg Thr 
305 310 315 320 

Leu Gly Leu He Asp Ala Ser Trp Ser Ala He Arg He Gly He Tyr 
325 330 335 

Ala Leu His Leu Glu Asn Trp Leu Gin Tyr Phe Pro Leu Ser Gin He 
340 345 350 

Leu Phe Val Ser Gly Glu Arg Leu He Val Asp Pro Ala Gly Glu Met 
355 360 365 

Ala Lys Val Gin Asp Phe Leu Gly Leu Lys Arg Val Val Thr Lys Lys 
370 375 " 380 

His Phe Tyr Phe Asn Lys Thr Lys Gly Phe Pro Cys Leu Lys Lys Pro 
385 390 395 ' 400 

Glu Asp Ser Ser Ala Pro Arg Cys Leu Gly Lys Ser Lys Gly Arg Thr 
405 410 415 

His Pro Arg He Asp Pro Asp Val He His Arg Leu Arg Lys Phe Tyr 
420 425 430 

Lys Pro Phe Asn Leu Met Phe Tyr Gin Met Thr Gly Gin Asp Phe Gin 
435 440 445 

Trp Glu Gin Glu Glu Gly Asp Lys 
450 455 

<210> 13 

<211> 284 

<212> PRT 

<213> Homo sapiens 



<220> 

<223> human NST-1 (aa 599 to 882) 
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<400> 13 

Lys Thr Cys Asp Arg Phe Pro Lys Leu Leu lie lie Gly Pro Gin Lys 
1 5 10 15 

Thr Gly Thr Thr Ala Leu Tyr Leu Phe Leu Gly Met His Pro Asp Leu 
20 25 30 

Ser Ser Asn Tyr Pro Ser Ser Glu Thr Phe Glu Glu lie Gin Phe Phe 
35 40 45 

Asn Gly His Asn Tyr His Lys Gly lie Asp Trp Tyr Met Glu Phe Phe 
50 55 60 

Pro lie Pro Ser Asn Thr Thr Ser Asp Phe Tyr Phe Glu Lys Ser Ala 
65 70 75 * 80 

Asn Tyr Phe Asp Ser Glu Val Ala Pro Arg Arg Ala Ala Ala Leu Leu 
85 90 95 

Pro Lys Ala Lys Val Leu Thr lie Leu lie Asn Pro Ala Asp Arg Ala 
100 105 no 

Tyr Ser Trp Tyr Gin His Gin Arg Ala His Asp Asp Pro Val Ala Leu 
115 120 125 

Lys Tyr Thr Phe His Glu Val lie Thr Ala Gly Ser Asp Ala Ser Ser 
130 135 140 

Lys Leu Arg Ala Leu Gin Asn Arg Cys Leu Val Pro Gly Trp Tyr Ala 
145 150 155 160 

Thr His He Glu Arg Trp Leu Ser Ala Tyr His Ala Asn Gin He Leu 
165 170 175 

Val Leu Asp Gly Lys Leu Leu Arg Thr Glu Pro Ala Lys Val Met Asp 
180 185 190 

Met Val Gin Lys Phe Leu Gly Val Thr Asn Thr He Asp Tyr His Lys 
195 200 205 

Thr Leu Ala Phe Asp Pro Lys Lys Gly Phe Trp Cys Gin Leu Leu Glu 
210 215 220 

Gly Gly Lys Thr Lys Cys Leu Gly Lys Ser Lys Gly Arg Lys Tyr Pro 
225 230 235 240 

Glu Met Asp Leu Asp Ser Arg Ala Phe Leu Lys Asp Tyr Tyr Arg Asp 
245 250 255 

His Asn He Glu Leu Ser Lys Leu Leu Tyr Lys Met Gly Gin Thr Leu 
260 265 270 

Pro Thr Trp Leu Arg Glu Asp Leu Gin Asn Thr Arg 
275 280 

<210> 14 
<211> 286 
<212> PRT 

<213> Homo sapiens 
<220> 

<223> human NST-2 (aa 598 to 883) 
<400> 14 

Lys Thr Cys Asp Arg Leu Pro Lys Phe Leu He Val Gly Pro Gin Lys 
1 5 10 15 
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Thr Gly Thr Thr Ala He His Phe Phe Leu Ser Leu His Pro Ala Val 
20 25 30 

Thr Ser Ser Phe Pro Ser Pro Ser Thr Phe Glu Glu He Gin Phe Phe 
35 40 45 

Asn Ser Pro Asn Tyr His Lys Gly He Asp Trp Tyr Met Asp Phe Phe 
50 55 60 

Pro Val Pro Ser Asn Ala Ser Thr Asp Phe Leu Phe Glu Lys Ser Ala 
65 70 75 80 

Thr Tyr Phe Asp Ser Glu Val Val Pro Arg Arg Gly Ala Ala Leu Leu 
85 90 * 95 

Pro Arg Ala Lys He He Thr Val Leu Thr Asn Pro Ala Asp Arg Ala 
100 105 no 

Tyr Ser Trp Tyr Gin His Gin Arg Ala His Gly Asp Pro Val Ala Leu 
H5 120 125 

Asn Tyr Thr Phe Tyr Gin Val He Ser Ala Ser Ser Gin Thr Pro Leu 
130 135 140 

Ala Leu Arg Ser Leu Gin Asn Arg Cys Leu Val Pro Gly Tyr Tyr Ser 
145 150 155 160 

Thr His Leu Gin Arg Trp Leu Thr Tyr Tyr Pro Ser Gly Gin Leu Leu 
165 170 175 

He Val Asp Gly Gin Glu Leu Arg Thr Asn Pro Ala Ala Ser Met Glu 
180 185 190 

Ser He Gin Lys Phe Leu Gly He Thr Pro Phe Leu Asn Tyr Thr Arg 
195 200 205 

Thr Leu Arg Phe Asp Asp Asp Lys Gly Phe Trp Cys Gin Gly Leu Glu 
210 215 220 

Gly Gly Lys Thr Arg Cys Leu Gly Arg Ser Lys Gly Arg Arg Tyr Pro 
225 230 235 240 

Asp Met Asp Thr Glu Ser Arg Leu Phe Leu Thr Asp Phe Phe Arg Asn 
245 250 255 

His Asn Leu Glu Leu Ser Lys Leu Leu Ser Arg Leu Gly Gin Pro Val 
260 265 ' 270 

Pro Ser Trp Leu Arg Glu Glu Leu Gin His Ser Ser Leu Gly 
275 280 285 

<210> 15 
<211> 291 
<212> PRT 

<213> Caenorhabditis elegans 
<220> 

<223> putative C. elegans 3-OST 
<400> 15 

Met Lys Tyr Arg Leu Leu Leu He Leu His Leu He Asp Leu He Ser 
1 5 10 15 

Cys Gly Val He Pro Asn Thr Ser Lys Lys Arg Phe Pro Asp Ala He 
20 25 30 

He Val Gly Val Lys Lys Ser Gly Thr Arg Ala Leu Leu Glu Phe Leu 
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35 40 45 

Arg Val Asn Pro Leu lie Lys Ala Pro Gly Pro Glu Val His Phe Phe 
50 55 60 

Asp Lys Asn Phe Asn Lys Gly Leu Glu Trp Tyr Arg Glu Gin Met Pro 
65 70 75 80 

Glu Thr Lys Phe Gly Glu Val Thr lie Glu Lys Ser Pro Ala Tyr Phe 
85 90 95 

His Ser Lys Met Ala Pro Glu Arg lie Lys Ser Leu Asn Pro Asn Thr 
100 105 110 

Lys lie He He Val Val Arg Asp Pro Val Thr Arg Ala He Ser Asp 
115 120 125 

Tyr Thr Gin Ser Ser Ser Lys Arg Lys Arg Val Gly Leu Met Pro Ser 
130 135 140 

Phe Glu Thr Met Ala Val Gly Asn Cys Ala Asn Trp Leu Arg Thr Asn 
145 150 155 160 

Cys Thr Thr Lys Thr Arg Gly Val Asn Ala Gly Trp Gly Ala He Arg 
165 170 175 

He Gly Val Tyr His Lys His Met Lys Arg Trp Leu Asp His Phe Pro 
180 185 190 

He Glu Asn He His He Val Asp Gly Glu Lys Leu He Ser Asn Pro 
195 200 205 

Ala Asp Glu He Ser Ala Thr Glu Lys Phe Leu Gly Leu Lys Pro Val 
210 215 220 

Ala Lys Pro Glu Lys Phe Gly Val Asp Pro He Lys Lys Phe Pro Cys 
225 230 235 240 

He Lys Asn Glu Asp Gly Lys Leu His Cys Leu Gly Lys Thr Lys Gly 
245 250 * 255 

Arg His His Pro Asp Val Glu Pro Ser Val Leu Lys Thr Leu Arg Glu 
260 265 270 

Phe Tyr Gly Pro Glu Asn Lys Lys Phe Tyr Gin Met He Asn His Trp 
275 280 285 

Phe Asp Trp 
290 

<210> 16 

<211> 4045 

<212> DNA 

<213> Homo sapiens 

<220> 

<223> 3-0ST-4 5* promoter /exon 
<400> 16 

gaattctgtg ggtgttggca ggggagacag aaaactatct tccatcgagt cttcggatcc 60 
attgggaatg cctggatgac gtcagagttc gccctgtgta ggtagctccc acttttcatt 120 
gtaggtttct caaggacttg ctcctagaaa aagcgtggct caaaagtaga taaaaaatag 180 
gcaactgcct aagtgtgaaa tttacaaagt tcctctccaa aaaagcccgc ctcctcccta 240 
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tcacttgtgg 


gcctgacatt 


ttaccaaagg 


24/25 
ggctctattc 


tttcaagagt 


ttgttattaa 


300 


agcgtgacta 


tttgaggatt 


ggaggcaaaa 


gggatactga 


gaaatgtcct 


tactagcagt 


360 


gtcaaggcaa 


gtgacataaa 


tgtgtggggg 


ggcaacttgt 


atgagcactg 


tgaaaacggc 


420 


agcatgttca 


ctctacttct 


cagctctgac 


tgaggggctc 


aaagttcagg 


atctgctgat 


480 


ttttcaacag 


taacgtcctc 


tccaaggtgt 


tttttttttt 


tccttttttg 


ggaaagcccc 


540 


cagtttaaac 


tattgcagcc 


agtttacatt 


tcttaatgtc 


actgtgctgg 


ccacattcag 


600 


agctccattt 


gccaccatcg 


gttttgatac 


ctttttacca 


aaacctttcg 


aaatttgaga 


660 


gcccatcttt 


agtaaaactg 


ggcatggagc 


agattcgttt 


ggattgctga 


gaggggagat 


720 


agaaaagttt 


gggtgctagg 


caggaactgc 


aaggaggacc 


tgggccatat 


gccagacatc 


780 


tagtgcctgg 


gccttgaaag 


ggagactggt 


cgctgacaag 


gcaatatctg 


ttgcaaccca 


840 


ggcttcctag 


atgaccacct 


tggatcatgg 


ctcggagcac 


agggagggct 


gggcagtgct 


900 


tgtgtttctc 


tccgttccag 


ttggcccctt 


cccattgaca 


ttacagtaat 


gcagttgtgt 


960 


gctgtttgaa 


aaagcatccc 


tagttacaca 


gaatgattta 


caggacacca 


gactctgcat 


1020 


ttcagaggtc 


tccagtgtac 


cataaaaaat 


atattataaa 


agaataatct 


ttatctgaac 


1080 


taaagctgca 


gtgaaggaaa 


ctcgtgtcca 


gctgagagca 


gcagtgagct 


tttgttcact 


1140 


cagggaaaag 


tccgtgttct 


ttatcttatt 


tgattaacat 


tttttttctc 


tttggcacta 


1200 


ggtatctcgt 


taatatttag 


acattatata 


cattttcttt 


caactggttt 


tcctattacg 


1260 


tgatcaaaca 


aaaccagaga 


tgccagccac 


agcagccaac 


aagggaaaag 


cagccccgtc 


1320 


aggcacccag 


ctgtggctgg 


ggggcaaatg 


gtactcacta 


gagccacccc 


caggaggcac 


1380 


ctggcagagc 


tctgtgcaga 


gccagccccg 


gttgcagaaa 


gctgagtttg 


ttggagtgcc 


1440 


tcagttgatc 


actctgtctc 


tttctcccat 


ttccctcact 


tccctgagca 


aaatgcaaca 


1500 


ggaagcaaag 


tctagttgtg 


aatcttccaa 


agccttctga 


tgtttaccat 


gttcccccag 


1560 


gagagggagg 


tgaggggtgg 


agatctctct 


gcaaagaaaa 


tacacttaaa 


aaatttcagc 


1620 


gagccgatgc 


acagacaccc 


agcaacccag 


cttgtctccg 


cttattaggt 


gttcagagcg 


1680 


acagtggtcc 


cacactattt 


cagtccagga 


aaccatgaac 


tccgttagtg 


gcaatgcccc 


1740 


cgaagaggcg 


caggtgtgtg 


cacctgtgat 


taagggtgtc 


gaggaggggc 


agcctcatct 


1800 


cttgaagcag 


aaagtgttgt 


cacctggtga 


tgggacagag 


ggaaaagctc 


tggggctggg 


1860 


aaacctgggg 


gcttgtgtca 


aagctccacc 


catcaggagc 


ttcaagagaa 


gatggggggc 


1920 


ggggggcggt 


ggctggaaag 


atggaagttg 


ggatgggaaa 


gcggttgtag 


aaaaggattc 


1980 


actcctggac 


cgaaggcagg 


aggatatccc 


gggcgagaga 


agggagggtc 


ggggatgggc 


2040 


tgagttggag 


tcccagagga 


aaagcggaag 


cgagagcttc 


gtcacccgct 


gtcttccagc 


2100 


tcccggtgcg 


cggcaccgga 


ggcaggcgtt 


gggctttacc 


tctctaaaag 


tactggggca 


2160 


aaggaatgga 


gaacacggcg 


tcccgagctc 


ccaagggagg 


ggagtaaacg 


aggtggggtg 


2220 


gggaacaccc 


caagtgcgtg 


cgtgctgggg 


ggctgggggg 


cacgatctcc 


gttctcccgg 


2280 
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gtgccccagc 



cctagcgcac gcctccgctc 



25/25 

ccccgccccc ttcgcaggcg cgcgcgaggc 



2340 



gcacccccct 



tccctcggcg gcgccgggcg 



cgcgcccggc cccctcctcc tcccctccgc 



2400 



gcctctcctc tctcccggca gaaagttagc agcggggaag gaactctggg ctgcaacagc 2460 
gcgcggcggc ggcggcagag gctgaagcag aagccgcggc ggagccgggg aagcgggggc 2520 
gctgcagacg gagcaggtgc cgccggcggg tccgcgcgcc cccctcggtc cccttgcctg 2580 
aggctgaggg gggggcggtg gtgggggggc cactcggact cggcgggcag cgtggggcgg 2640 
ggggccatgc ggccgggctc ccccctggcg cagcgggaca gcggccaggg ccgggggcgc 2700 
agcggcgtcg cttcatgcag ccggggcggc tgggcagcgg cggcggcggc ggcggcggcg 2760 
gcggcggggg cggcggctga aaccatgtcc gggcagcgcc gggggctgcc gccgccgccg 2820 
ccgccgccgc gagccgggag ccgcgatggc ccggtggccc gcacctcctc cgcctccgcc 2880 
tccgcctcca cctctggccg cgccgccgcc gcccggcgcc tctgctaagg ggccgccggc 2940 
gcgcaagctg ctttttatgt gcaccttgtc cctgtctgtc acctacctgt gctacagcct 3000 
cctgggcggc tcgggctccc tgcaattccc tctggcgctg caggagtcgc cgggcgccgc 3060 
cgccgagccc ccgccgagcc cgccgccacc ctctctgctg cctacccccg tgcgcctcgg 3120 
cgccccctcg cagccgcccg cgccgccgcc gctggacaac gcgagccacg gggagccgcc 3180 
cgagccccca gagcagccag ccgcccccgg gaccgacggc tgggggctgc cgagcggcgg 324 0 
cggaggcgcc cgggacgcct ggctccggac cccgctggcc cccagcgaga tgatcacggc 3300 
tcagagcgcg ctgccggaga gggaagcgca ggagtccagc accaccgacg aggatctcgc 336 0 
aggccggaga gcggccaacg ggagcagcga gaggggcggc gccgtcagca cccccgacta 3420 
tggggagaag aagctgccac aggcgctcat catcggggtc aagaaaggag ggacccgcgc 3480 
gctgctggag gcgatccgcg tgcacccgga cgtgcgggcg gtgggcgtag agccgcactt 354 0 
cttcgacagg aactacgaaa aggggttgga gtggtacagg taggaccctg ggctccgcgg 3600 
gctggtggag acgcgtgggg gagacgcgga ggggaagccg cggctttcca cgcccttcga 3660 
gcatccaggc accgtcccga gaggcccaag cccccgcgag ggctctgcaa accctggcgg 3720 
cgttgctcag ggggatcggc tgagagggct ggactccagc gaaaggtcac tttatttcag 3780 
ggcgagggga ggaggtgtca ccctgccctg cctcccgcgc tcctcatcca aggaggtgct 3840 
gtctgaatct gcccagctcc aagcctggga acccccagcc ctcctgcctg ctgggtgttt 3900 
ccgaaaccag gctcttgcgg ggttctggga ttctgggcag aggactttga ggagtgagac 3960 
aggatggcta aattgactaa ggggatttga ggtcccctgg aatctcttaa aatcaccctc 4020 
aaacgcattt gcgtggctgg aattc 4 045 



